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The generation game 


Birth-cohort studies offer invaluable data on the links between childhood development and later 
life, but today’s efforts could learn something from a pioneering project that turns 65 this week. 


of. The Beatles, fish and chips, cream teas and pubs tend 

to rank high in polls, as can the Royal Family, particularly 
with wedding bells in the air. But ask epidemiologists, and they will 
probably praise a lesser-known British achievement: birth-cohort 
studies, the observation of groups of people from birth onwards. 

This week, members of the oldest British birth cohort, all born 
in one week in March 1946, will celebrate their 65th birthdays (see 
page 20). They are part of the longest-running human experiment of 
its type, an endeavour that — along with later generations, including 
cohorts born in 1958, 1970 and at the turn of the millennium — is the 
envy of researchers around the world. The cohorts offer important 
lessons for scientists who want to launch similar efforts today, as well 
as for politicians who question the merits of funding such work. The 
1946 cohort shows, in stunning detail, how long-term studies can pay 
off. It has provided a treasure-trove of data linking early socioeco- 
nomic status, health and development to later events, such as disease, 
educational attainment and well-being. And it is already starting to 
show how genetics and a lifetime of experiences influence the ageing 
process. Sometimes, the only way to understand human life is to study 
it. This week, the United Kingdom announced that it will spend some 
£33.5 million (US$54.5 million) over five years on cohort research, 
including a new study of about 90,000 children. 

Not all cohort studies receive universal praise. The National Chil- 
dren’s Study in the United States, which is recruiting participants and 
aims to track around 100,000 children from birth to age 21, has been 
more than a decade in the planning, cost US$194 million in 2010, 
and to its critics is a vast and overambitious data-gathering exercise 
without clear goals. Plans for a British birth cohort in the 1980s were 
vetoed by the Conservative government. 

It is not just about money — the 1946 cohort, after all, has survived 
on a hand-to-mouth basis for most of its existence. The study was 
triggered by concerns about falling fertility rates in post-war Britain. 
Its gung-ho leader, James Douglas, was able to contact and ques- 
tion some 13,000 mothers who gave birth soon after the end of the 
Second World War — and to publish influential results within two years 
that prompted legislation leading to improved access to pain-relief 
during childbirth. Such rapid data collection and response would be 
impossible today, given the (often necessary) legal, ethical and bureau- 
cratic framework erected around research in the intervening decades. 
Participants are now harder to recruit, and more likely to move away 
or drop out. And as science has developed, so the hypotheses and 
factors examined in modern cohort studies have proliferated. Gadgets 
measure every pollutant breathed, calorie consumed or step walked in 
pregnancy, and are accompanied by intelligence tests, studies of behav- 
iour and parenting style, and countless clinical tests and biomolecular 
studies. The US National Children’s Study has suffered from spiralling 
complexity and cost, partly attributable to investigators wanting to 


Ne and then, Britain creates something it can really be proud 


measure every possible variable. 

One way to avoid this kind of scientific paralysis is to follow new 
cohorts every ten years or so. Questions not asked of one group can 
then be held over for the next. The need to initiate cohort studies is 
particularly pressing at the moment, with the deep budget cuts taking 
place in the United Kingdom and elsewhere threatening to increase 

financial, health and educational inequali- 


“Studies of ties. How, except by following those born 
today’s children during and immediately after the financial 
are just as storm, can society learn about the long- 
valuable as term social impacts of these changes over 
studies of those a lifetime? Such questions are particularly 


bornin 1946.” urgent in the United States, and much will be 
learned from the National Children’s Study, 
but it ought to articulate a clear, science-based vision and prove that 
it can provide value for money. Does the study need the dozens of 
data-collection centres that it has scattered across the country, or can 
it be streamlined? Such questions must be carefully considered by 
politicians and scientists vying for a piece of the action. 

In return, those who run cohorts must share their rich data with 
suitable collaborators — while adhering to appropriate confidenti- 
ality standards — and ensure that results are disseminated widely, 
particularly to policy-makers. Lifestyles and science are both more 
complex in 2011, but studies of today’s children are just as valuable 
as studies of those born in 1946. Happy birthday to the Douglas 
babies — and here's to the next generation. m 


Invest to diversify 


Despite many federal initiatives, the number of 
US scientists from minority groups remains low. 


inorities and other marginalized groups have not always 
Mee the best relationship with science. In the 1930s, 

researchers from the US government started a series of 
experiments that recruited hundreds of African American men 
infected with syphilis, then left their disease untreated to study its 
natural progression. (The government did, however, provide free burial 
insurance.) More recently, American Indians from the Havasupai tribe 
sued Arizona State University in Tempe over claims that geneticists had 
collected and analysed blood samples from tribe members without 
obtaining proper consent. The two parties settled that suit last year. 
Indigenous peoples in other countries such as Australia also have 
historical reasons to be suspicious of mainstream scientists. 
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For more than a decade, US leaders have been trying to move beyond 
that troubled past and recruit minorities into science and engineering. 
There are strong moral arguments for doing so. But in times of massive 
budgetary shortfalls, morals do not guarantee funds. Congress and 
the public should recognize the powerful practical reasons to support 
programmes that aim to raise the numbers of minorities in science. 

A key issue is that of numbers. There is concern in the United States 
about the shrinking proportion of home-grown scientists. Foreign-born 
students, particularly from China and India, account for almost all of the 
growth in the number of science doctoral degrees granted in America. 
And many then take their skills back home. Minorities make up a grow- 
ing share of the US population and represent a relatively untapped pool 
from which to draw the next generation of scientists. 

They also bring fresh ideas to research. This sometimes results in the 
pursual of topics that can help specific communities but have not man- 
aged to capture the attention of mainstream researchers. An example 
of which is Katie McDonald, who embarked on a research project as 
an American Indian student at a tribal college in Montana. She found 
higher-than-expected levels of mercury in local fish and has helped 
her own tribe to avoid health problems (see page 25). 

Bringing more diversity into the ranks of researchers will help to 
overcome the lingering suspicion of science that persists in some 
minority communities. In doing so, it will encourage members of the 
public to accept the products of research, whether they are govern- 
ment health recommendations or reports about the changing climate. 
Without that kind of trust, researchers could see their work ignored 
by segments of the population. 

For these and other reasons, the US government has poured 


substantial funds into pulling more underrepresented minorities 
into science. The National Science Foundation spent more than 
$110 million on this in 2010, and other agencies, such as the National 
Institutes of Health, NASA and the US Department of Education, also 

have programmes to boost minority participation in science. 
These initiatives still have a long way to go. The National Research 
Council (NRC) reported last year that underrepresented minori- 
ties made up 28% of the US population in 


“Minorities 2006 but accounted for only 9% of college- 
represent educated Americans in the science and 
a relatively engineering workforce. 

untapped pool And in some cases, the numbers are prov- 
from which to ing hard to move. In 2008, American Indians 
draw the next sain just 0.7% = the bachelor’s degrees 
generation of awarded in science and engineering — a pro- 


portion that is unchanged since 2000. Science 
bachelor’s degrees earned by black students 
has also stayed constant at 8.3%. For doctoral 
degrees, the figures are even starker. American Indians, who represent 
1% of the population, earn only 0.3% of the PhDs in science and engi- 
neering. Black people make up 13% of the US population but accounted 
for just 3% of the doctoral degrees awarded in 2008 in these fields. 

The problem creates a vicious cycle. Similar proportions of minor- 
ity and white students enter university intending to study science. 
However, the completion rate for minorities is lower. Many factors 
contribute to this gap, according to the NRC, but one remains the poor 
diversity of university faculty members and the scarcity of role models 
in science for students from minority groups. m 


scientists.” 


Dark rumblings 


The Large Hadron Collider is stirring up 
trouble, and that’s good news for science. 


James Clerk Maxwell had just published a series of papers that 

unified electricity, magnetism and light into a theory that could be 
expressed in a few equations. In doing so, he settled a long-running 
debate over whether light was a continuous wave of energy or a spray 
of tiny particles. It was, to anyone who understood Maxwell’s work, 
quite obviously a wave. That raised a question, although it seemed to 
be more ofa niggling detail to Maxwell’s devotees: like water waves or 
sound, the new, electromagnetic light waves should need a medium 
through which to travel. If Maxwell was right, what did it look like? 

So began the search for the notorious ether. In one spectacular 
experiment in 1887, Albert Michelson and Edward Morley designed 
and built a prototype interferometer to measure the speed of light at 
different points in Earth’s orbit and showed that the speed was constant 
— impossible iflight and Earth were flowing through an unseen liquid. 
Contrary to all their expectations, the ether wasn't there. 

There are some parallels between physics then and physics now. 
Like the 1860s, the 1960s saw an incredible unification of modern 
physical theories. This time, the standard model of particle physics 
took Maxwell’s electromagnetic force and wove it with the strong 
and weak nuclear forces. According to the theory, at sufficiently high 
energies the weak and electromagnetic forces merge into a single, 
electroweak force. 

Like Maxwell’s theory, the standard model is powerful, but there 
are some details that it can’t quite explain. One is dark matter, a so-far 
undetected entity that makes up most of the matter in the cosmos. 
Another is dark energy, a force that seems to be pushing the Uni- 
verse apart. There are some other unexplained odds and ends too, but 


E the 1860s, physics looked beautiful. The Scottish physicist 
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nothing formidable enough to push the standard model from its perch. 

To deal with some of the problems, the best theorists of the day 
have proposed an extension of the model, known as supersymmetry. 
This modification unifies the electroweak force with the strong 
nuclear force, and suggests some elementary particles that might 
explain dark matter. 

Now, an experiment has come along to challenge the standard model 
and its offspring. The Large Hadron Collider (LHC), a 27-kilometre 
proton-proton collider on the French—Swiss border near Geneva, 
Switzerland, is delivering a torrent of data that can be used to probe 
the boundaries of the standard model. But the collider has yet to find 
evidence of the particles suggested by supersymmetry theory (see page 
13). If it finds nothing in the next year, the theory will look like it is in 
serious trouble. If it finds nothing in two years, then many theorists 
will probably abandon it, just as theorists eventually had to abandon 
extensions of Maxwell’s work that explained away the missing ether. 

The parallels with history shouldn't be taken too seriously. The LHC 
is a much more elaborate experiment than the one done by Michelson 
and Morley. It uses proton collisions to probe unknown energies for 
all sorts of things, not just the supersymmetrical particles some hope 
it will find. Nor is the LHC likely to deliver a clear refutation of super- 
symmetry — the theory, the data and the analysis are all much more 
complicated than they were 125 years ago. 

But comparison can remind us of something that is easily 
overlooked: the negative results now coming out of the LHC should 
be just as stimulating as any positive findings. Michelson and Morley’s 
experiment, and others like it, eventually led Albert Einstein to 
axiomatically accept that light travelled at a constant speed and could 
be both a wave and a particle. Those revelations never really disproved 
Maxwells theories, but they did help to develop special relativity and 
quantum mechanics — the two greatest theories of the twentieth 
century. In the same way, the LHC’s results — 
whatever they may be — could force scientists 
to think differently. If one beautiful theory cant 
explain the data, then there must be another out 
there somewhere that can. m 
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broken, particularly in the life sciences, my own overcrowded 

field. In coffee rooms across the world, postdocs commiser- 
ate with each other amid rising anxiety about biology’s dirty little 
secret: dwindling opportunity. Fellowships are few, every advertised 
academic post draws a flood of candidates, and grants fund only a 
tiny fraction of applicants. 

The scientific job market has been tight for decades, but the recent 
global recession and accompanying austerity measures have brought it 
into sudden focus for young — and some not so young — researchers, 
who face a widening chasm between their cycles of contract work and 
a coveted lab-head position. 

This is a familiar lament, but I also propose a solution: we should 
professionalize the postdoc role and turn it into a career rather thana 
scientific stepping stone. 

Consider the scientific community as an 
ecosystem, and it is easy to see why postdocs 
need another path. The system needs only one 
replacement per lab-head position, but over the 
course of a 30-40-year career, a typical biolo- 
gist will train dozens of suitable candidates for 
the position. The academic opportunities for a 
mature postdoc some ten years after completing 
his or her PhD are few and far between. 

Most fellowships are earmarked for youth and 
not applicable to experienced postdocs. Landing 
alab-head position requires a strong publication 
record, which can be as much about luck as skill 
and hard work. Rare ancillary research positions, 
such as technicians and scientific officers, are fre- 
quently junior — or also on short-term contracts 
linked to a grant. Competition for senior posi- 
tions in industry is just as fierce. 

Beyond research, there are science-related jobs, such as in publish- 
ing, grants administration and public engagement. But these positions 
seldom require more than a doctorate, if that. And to force a highly 
trained postdoc from research is a terrible waste of time and public 
expense. The ageing postdoc may well struggle to make up for those 
lost ten years when starting again in a different career. Meanwhile, 
after many years of relatively low pay, they can be years behind in terms 
of savings and pensions. 

The scientific enterprise is run on what economists call the ‘tour- 
nament model, with practitioners pitted against one another in bitter 
pursuit ofa very rare prize. Given that cheap and disposable trainees — 
PhD students and postdocs — fuel the entire sci- 


r | “he career structure for scientific research in universities is 


entific research enterprise, it is not surprising that NATURE.COM 
few inside the system seem interested in change. _ Discuss this article 
A system complicit in this sort of exploitation is at online at: 


best indifferent and at worst cruel.[havenodoubt — go.nature.com/vflfh8 
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FORCE 
AHIGHLY TRAINED 
POSTDOC 
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IS A TERRIBLE 


WASTE. 


Give postdocs a career, 
not empty promises 


To avoid throwing talent on the scrap heap and to boost prospects, anew type 
of scientific post for researchers is needed, says Jennifer Rohn. 


that most lab heads want the best for their many apprentices, but at the 
system level, the practice continues. Few academics could afford to warn 
trainees against entering the ring — if they frightened away their labour 
force, research would grind toa halt. 

An alternative career structure within science that professionalizes 
mature postdocs would be better. Permanent research staff positions 
could be generated and filled with talented and experienced postdocs 
who do not want to, or cannot, lead a research team — a job that, after 
all, requires a different skill set. Every academic lab could employ a 
few of these staff along with a reduced number of trainees. Although 
the permanent staff would cost more, there would be fewer needed: 
a researcher with 10-20 years experience is probably at least twice as 
efficient as a green trainee. Academic labs could thus become smaller, 
streamlined and more efficient. The slightly fewer trainees in the pool 
would work in the knowledge that their career 
prospects are brighter, and that the system that 
trains them wants to nurture them, not suck 
them dry and spit them out. 

An added benefit would be that instead of labs 
completely turning over every 4-5 years, with 
precious lore and knowledge lost along the way, 
they would have continuity. Fresh blood in a lab 
is useful, but so too are experienced people who 
can train others more efficiently, who are in touch 
with the latest techniques and who have first- 
hand knowledge of the lab’s carefully amassed 
treasure-trove of materials. 

Where should the cut-off be made to allow 
for the smaller number of trainees admitted? 
People with PhDs are useful to society, and are 
eminently employable in non-research fields. I 
would not necessarily advocate restricting their 
numbers, but every candidate should be given 
a realistic assessment of their chances of becoming a lab head. The 
model I propose would reduce the number of trainee postdocs infused 
into the system, and then apply market forces — much as medical 
schools in many countries regulate the number of trainees by using 
the principles of supply and demand. 

It wont be easy. Staff positions are typically attached to a lab head’s 
temporary grant, not to the institutes that house them. Finance and 
numbers will need to be carefully balanced. Universities would have 
to create new permanent positions, and be willing to fund them long 
term. But the first step is to admit we have a problem, and that the 
problem is worth tackling. m 


Jennifer Rohn is a cell biologist at University College London and 
editor of http://LabLit.com. Her most recent book is The Honest Look 
(Cold Spring Harbor Laboratory Press). 

e-mail: jenny@lablit.com 
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RESEARCH HIGHLIGHTS 


A mammalian 
change of heart 


Many fish can replace lost 
cardiac tissue throughout 
their lives, but adult mammals 
cannot. Researchers have now 
discovered a stage very early 
in life at which mammals 

can mend their own hearts 
through the replication of cells 
called cardiomyocytes. 

Hesham Sadek and Eric 
Olson at the University of 
Texas Southwestern Medical 
Center in Dallas and their 
team surgically removed 
about 15% of muscle tissue 
from the ventricle walls of 
1-day-old mice. One week 
later, they found evidence of 
cardiomyocyte proliferation 
in the heart. The animals fully 
recovered their muscle tissue 
and organ function within two 
months. The same procedure 
performed on 7-day-old mice 
did not lead to cardiomyocyte 
proliferation or recovery. 

This work may lead to new 
strategies for reawakening 
regeneration in the adult 
mammalian heart after injury. 
Science 331, 1078-1080 (2011) 
For a longer story on this 
research, see: go.nature.com/ 
io3ccw 


Identifying 
reef fish 
at risk 


More than 
one-third of 
coral-reef fish 
species in the Indian 
Ocean, such as the 
butterflyfish Chaetodon 
trifascialis (pictured), could 
become extinct in their local 
environment as a result of 
climate change. 

Nicholas Graham at 
James Cook University in 


Selections from the 
scientific literature 


More sneezing in a warmer world 


Climate change is bad news for people with 
allergies: a warmer climate means a longer pollen 
season. In just 15 years, the pollen season of one 
common allergen has lengthened by as much as 
27 days in some parts of North America. 

The prevalence of allergies is increasing in 
the United States, but linking this increase with 
climate change has been a stretch. A team led by 
Lewis Ziska at the US Department of Agriculture 
in Beltsville, Maryland, compared readings 


of ragweed pollen (pictured) since 1995 at 

10 stations across North America with changes 
in temperature and first frost. They found a clear 
link between recent warming and the length 

of the pollen season. What's more, the farther 
north they looked, the greater the extension to 
the season — so allergy-prone Canadians should 
consider buying tissues in bulk. 

Proc. Natl Acad. Sci. USA doi:10.1073/ 
pnas.1014107108 (2011) 


Townsville, Australia, and 
his colleagues developed a 
method for predicting how 
vulnerable species are to local 
extinction, taking into 
account variables such as 
how picky the fish are 
about their food or 
habitat. 

The researchers 
found that 56 of 
the 134 fish species 
studied were at 
risk of losing their 

habitat, shelter or 
food sources as a 
result of climate change. 
Interestingly, those fish at 
greatest risk from climate 
change were not the same as 
those at greatest risk from 
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overfishing. The predictions 
could be used to better 
manage animal populations 
and habitats, helping to ensure 
survival under climate change 
and other pressures. 

Ecol. Lett. doi:10.1111/j.1461- 
0248.2011.01592.x (2011) 


Think of yourself 
when quitting 


In smoking-cessation 
programmes, cognitive 
therapy is more successful 

if it is tailored to individuals 
than if it is applied generically. 
The difference may lie in the 
recruitment of brain areas 
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activated by thinking about 
oneself, scientists have found. 

Hannah Faye Chua and her 
colleagues at the University 
of Michigan in Ann Arbor 
presented would-be quitters 
with messages relevant to their 
lives and the obstacles they 
perceived to changing their 
smoking behaviour, while 
scanning their brains with 
functional magnetic resonance 
imaging. 

Activation of the 
dorsomedial prefrontal cortex, 
an area that is activated when 
people think about themselves, 
was correlated with how 
likely participants were to 
have stopped smoking four 
months after the scanning. 


M. OEGGERLI/SPL 


This correlation was not seen 
when the patients were given 
non-tailored messages about 
smoking during scanning. 
Nature Neurosci. doi:10.1038/ 
nn/2761 (2011) 


BIOLOGY 


Predators trigger 
plankton stealth 


Tiny water-dwelling organisms 
called phytoplankton can 
adopta ‘stealth’ mode to avoid 
the attentions of predators. 
Many types of plankton 
group together into chains, 
and some respond to grazers 
by increasing their group 
size until the chains are too 
large to eat. Erik Selander of 
the Technical University of 
Denmark in Charlottenlund 
and his colleagues show 
that predators can trigger 
the opposite response in 
Alexandrium tamarense. 
When exposed to small 
plankton-eating crustaceans 
called copepods, chains of 
Alexandrium adopt stealth 
behaviour, splitting into single 
cells or very short chains and 
swimming more slowly. The 
phytoplankton drastically 
reduces its encounters 
with grazers through this 
mechanism, the authors 
report. 
Proc. Natl Acad. Sci. USA 
doi:10.1073/pnas.1011870108 
(2011) 


EPIDEMIOLOGY 


Farm kids benefit 
from microbes 


Exposure to diverse microbes 
could explain why children 
who grow up on farms are less 
likely to develop asthma than 
their suburban counterparts. 
Previous work showed that 
children raised on farms are 
protected from childhood 
asthma and a class of allergic 
reactions called ‘atopy. Now, 
Markus Ege of the University 
Children’s Hospital Munich in 
Germany and his colleagues 
have analysed the microbial 
populations in dust collected 
from 933 children’s rooms. 
They found that bacteria and 


fungi were more numerous 
and widespread in samples 
collected for children who 
live on farms. They also 
found that the risk of asthma 
and atopy decreased as the 
number of microbial taxa 
increased. In particular, fungi 
from two genera, Eurotium 
and Penicillium, were tightly 
associated with reduced 
asthma risk. 

N. Engl. J. Med. 364, 701-709 
(2011) 


CLIMATE CHANGE 


Sea-ice models 
don’t measure up 


Climate models do a poor job 
when it comes to simulating 
sea-ice change in the Arctic. 
Michael Winton of the 
Geophysical Fluid Dynamics 
Laboratory in Princeton, New 
Jersey, compared data from 
the era of satellite observations 
and five state-of-the-art 
climate models of Northern 
Hemisphere sea-ice cover. 
All of the model simulations 
considerably underestimated 
the observed sea-ice decline. 
Substantial natural 
variability in the annual sea ice 
would be necessary to explain 
the discrepancy between 
observations and even the 
best-performing model. It 
is more likely that current 
climate models are not nearly 
sensitive enough to accurately 
gauge the behaviour of sea ice 
in response to warming, the 
authors say. 
J. Clim. doi:10.1175/ 
2011JCLI4146.1 (2011) 


GENETICS 


Clues from 
big-hearted mice 


Mice bearing the mutations 
underlying two human heart 
syndromes have pointed the 
way to possible treatments. 
Noonan and LEOPARD 
syndromes both cause short 
stature, facial deformities and 
abnormally thick hearts that 
cannot pump properly. 
Benjamin Neel and 
Toshiyuki Araki of the Ontario 
Cancer Institute in Toronto, 


RESEARCH HIGHLIGHTS 


THIS WEEK 


COMMUNITY 


CHOICE 


The most viewed 


papers in science 


Dogs keep an eye on their owners 


3 HIGHLY READ 
on elsevier.com 
up to 2I February 


Dogs are famously good at reading human 
body language, following human gaze and 
stealing human food. But not all humans 


are equal in the eyes of Canis familiaris. 
Paolo Mongillo and his colleagues at the University of Padua 
in Italy investigated the attention dogs paid to their owners 
and to strangers. Each dog watched as its owner anda 
stranger walked back and forth across a test room in opposite 
directions, popping in and out of two doors. 

Not surprisingly, the dogs kept their eyes on their owners 
most of the time, and stared at the doors they had gone 
through. At least, young dogs did. Dogs over the age of seven 
didn't stare at the door their owners had gone through with 
the same frequency, perhaps indicating some cognitive 
decline, or that they have learned over the years that their 
owners always come back in the end. 


Anim. Behav. 80, 1057-1063 (2010) 


Canada, and their co-workers, 
engineered mice to have the 
mutation in the Rafl gene that 
underlies Noonan disease. 

In addition to features of the 
human syndrome, the mice 
had increased activity of the 
Mek protein. Pups given a Mek 
inhibitor started small but they 
grew faster and caught up with 
normal mice bya couple of 
weeks after birth. 

Meanwhile, Neel and Maria 
Kontaridis of Harvard Medical 
School in Boston and their 
colleagues inserted into mice 
the mutation in the Ptpn11 
gene that causes LEOPARD 
syndrome. The activity of 
a protein called mTor was 
abnormally high in these mice, 
and giving them the mTor 
inhibitor rapamycin repaired 
heart defects. 

J. Clin. Invest. doi:10.1172/ 
JC144929 (2011) 
J. Clin. Invest. doi:10.1172/ 
JC144972 (2011) 


Slip and slide 
pores for sensors 
Taking their inspiration from 
nature, researchers have coated 


nanopores with fluid bilayers 
to sense single proteins. The 


creation, which mimics the 
pores in the olfactory system 
of a silk moth (pictured), was 
developed by Michael Mayer 
at the University of Michigan 
in Ann Arbor, Jerry Yang at the 
University of California, San 
Diego, and their team. 

By modifying the lipid 
with specific ligands, the 
researchers can control which 
proteins move through the 
pore, and how long their 
journey takes. The system can 
also be tweaked to slow down 
proteins that would otherwise 
translocate too fast to be 
analysed accurately. 
Nature Nanotechnol. doi:10.1038/ 
NNANO.2011.12 (2011) 


© NATURE.COM 
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POLICY 


Irish election 


Scientists in the Republic of 
Ireland hope that support for 
science will continue after 

the Fine Gael party came 

to power in elections held 

on 25 February, ousting the 
long-standing incumbents, 
Fianna Fail. Fine Gael, 

which must form a coalition 
government, will have to deal 
with the country’s economic 
crisis by cutting some public 
spending. The party is opposed 
to research using human 
embryonic stem cells, which 
has never benefited from clear 
regulation in Ireland. See 
go.nature.com/ftx2hu for more. 


Forest mission 

India will spend 460 billion 
rupees (US$10 billion) over a 
decade planting new forests 
and improving the quality of 
tree cover in existing forests, 
according to a plan approved 
by the Prime Minister’s 
Council on Climate Change 
on 23 February. Subject to 
expected parliamentary 
approval, this “National 
Mission for a Green India — 
one of eight missions under a 
national action plan on climate 
change — will start from 2012. 


Push for carbon tax 
Australia’s prime minister Julia 
Gillard has proposed placing 

a fixed tax on carbon dioxide 
from July 2012, calling the 
move “an essential economic 
reform’ It is the third time 
that Australia’s government 
has vowed to tax carbon 
emissions to tackle climate 
change; Gillard’s predecessor 
Kevin Rudd twice failed to 

get a carbon-cutting bill 

past his Senate. Speaking on 
24 February, Gillard said she 
hoped to move to a market- 
based emissions trading 
scheme three to five years after 
the fixed price comes in. Its 
value has not yet been decided. 


The news in brief 


Dire threats to coral reefs 


More than 60% of the world’s coral reefs are 
directly threatened by local human activities 
such as coastal pollution and destructive 
fishing. When global pressures, including rising 
ocean temperatures or ocean acidification, are 
taken into account, about 75% are threatened, 
with the proportion expected to rise to 90% 


Egypt reshuffle 

As protests continue across 
the Arab world, Egypt’s 
interim cabinet was reshuffled 
last week and included 

new appointees to oversee 
education and science. Amr 
Salama, a professor of civil 
engineering, is minister of 
scientific research, replacing 
Hani Helal. Ahmed Gamal 
Moussa replaces Ahmed Zaki 
Badr as education minister. 
Both appointees are respected 
by scientists and had held 
similar positions in 2004, only 
to be sacked a year later. They 
may have little chance to make 
an impact, with the interim 
government in place for six 
months at most. See go.nature. 
com/ghqimz for more. 


India’s budget 
Indian scientists were 
disappointed by increases 
in funding for research 
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agencies in the country’s 
2011-12 budget, presented 

on 28 February. The Indian 
ministry for science and 
technology saw a 17% increase 
on last year’s budget to some 
75.5 billion rupees (US$1.67 
billion), while atomic energy 
and space also saw double- 
digit percentage increases. But 
with the economy booming 
and inflation running above 
8%, “if we want to catch up 
with China we must make big 
investments in science’, 
C.N.R. Rao, chairman of the 
prime minister's scientific 
advisory council, told Nature. 
“These lollipops will not do” 


Shuttle swansong 
NASAs space shuttle Discovery 
launched for its 39th and 

final flight on 24 February, 
taking six astronauts as well as 
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by 2030. The World Resources Institute in 
Washington DC published the statistics on 

23 February in Reefs at Risk, a report updating 

a 1998 study. The latest report emphasized that 
reefs affect society, providing food and coastline 
protection, and said that they can rebound if 
communities stop unsustainable practices. 


supplies and additional science 
capabilities to the International 
Space Station on an 11-day 
mission. NASAs other two 
shuttles are each due to fly 
once more this year before the 
agency’s shuttle fleet retires. 


Unethical studies 
A meeting of the US 
presidential bioethics 
commission in Washington 
DC this week triggered 
reporting of past unethical 
human experiments by US 
researchers, mostly from 

the 1940s to the 1960s. The 
commission met in part to 
discuss last year’s revelations 
that US government 
researchers secretly gave 
syphilis to hundreds of 
Guatemalan prison inmates 
in the 1940s (see Nature 467, 
645; 2010). But the Associated 
Press, trawling medical 
journals and old newspaper 
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articles, dug up more than 40 
instances of similarly dubious 
tests. All had been publicly 
disclosed, unlike the syphilis 
experiments, but did not draw 
the condemnation at the time 
that they would today. 


Oil-spill health study 


A study claiming to be the 
largest ever to follow up the 
long-term effects of an oil 
spill on human health was 
launched on 28 February 

(see nihgulfstudy.org). 

The National Institutes of 
Health says it has committed 
US$19 million to the project 
so far; its National Institute 
of Environmental Health 
Sciences hopes to spend a 
decade following 55,000 of the 
workers and volunteers who 
supported the clean-up effort 
after the Deepwater Horizon 
disaster in the Gulf of Mexico. 


Booking a rocket 


The first contracts have been 
signed to send researchers into 
suborbit using commercial 
spacecraft. The Southwest 
Research Institute, in San 
Antonio, Texas, said last week 
it had paid for six scientists 

to fly with XCOR Aerospace, 
based in Mojave, California, 
and had paid deposits for two 
scientists to fly with Virgin 
Galactic, whose spacecraft 
will take off from Spaceport 
America in New Mexico. 

The institute may opt to 
purchase a total of 17 seats 


TREND WATCH 


Developing countries look 


poised to overtake industrialized 
countries in planting genetically 


modified (GM) crops (see 


chart). Brazil, Argentina, India, 
China and South Africa together 
accounted for 43% of the global 


total of biotech crops planted 


commercially last year. In 2010, 
Pakistan and Myanmar grew GM 
crops commercially for the first 
time, opting for biotech cotton. 
Sweden also made its first foray 


into commercial GM crops, 
planting the Amflora high- 
starch potato. 


with the two companies, each 
costing US$100,000-200,000. 
Scientists would conduct 
experiments including 
biomedical monitoring and 
atmospheric imaging. 


Viral response plan 


Medical virologists from 
around the world gathered in 
Washington DC on 1-3 March 
to work out the details of 

a Global Virus Response 
Network. Meeting attendees, 
invited by virologist Robert 
Gallo of the University of 
Maryland School of Medicine 
in Baltimore, hope to form 

an organization that would 
act as a global first-responder 
to identify, investigate and 
eradicate viral outbreaks. 

The network would also 
inform governments, health 
organizations and the public 
about existing viruses and 
attract scientists to the field. 


Wheat killer 


A research programme 
tackling a devastating wheat 
fungus has been granted 
US$40 million over five 
years as part of a partnership 
between the Bill & Melinda 
Gates Foundation in Seattle, 
Washington, and the UK 
Department of International 
Development. The Durable 
Rust Resistance in Wheat 
project, involving more 

than a dozen institutes and 
coordinated by Cornell 
University in Ithaca, New 


York, aims to create plants 
that can withstand strains 

of the evolving stem-rust 
pathogen Ug99. See go.nature. 
com/4wm$8ste for more. 


PEOPLE 


German plagiarism 
Germany’s defence minister, 
Karl-Theodor zu Guttenberg 
(pictured), has resigned 

after a row over plagiarism 

in his PhD thesis. The 
University of Bayreuth 
withdrew Guttenberg’s 
doctoral thesis in law last 
week, confirming that large 
parts of the document, written 
in 2006, were plagiarised. 
German chancellor Angela 
Merkel initially argued that 
academic wrongdoings 

didn't diminish Guttenberg’s 
political merits, but public 
pressure forced his resignation 
on 1 March. Thousands 

of German academics and 
doctoral students had joined 
the outcry, signing an online 
letter complaining that Merkel 
was trivializing academic 
plagiarism. 


DEVELOPING COUNTRIES DRIVE GM CROP RISE 


Planting of genetically modified (GM) crops grew by 
10% in 2010 to 148 million hectares worldwide. 
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SEVEN DAYS | THIS WEEK | 


3-6 MARCH 

The American 
Association for 

Cancer Research 

hosts a conference in 
Vancouver, Canada, 
exploring links between 
stem cells and cancer. 
go.nature.com/Slwqim 


7-11 MARCH 
Preliminary analysis 
of dust picked up from 
a distant asteroid last 
year by the Hayabusa 
spacecraft will be among 
highlights of the 42nd 
Lunar and Planetary 
Science Conference in 
The Woodlands, near 
Houston, Texas. 
go.nature.com/eugg9g 


9-13 MARCH 

The 10th International 
Conference on 
Alzheimer’s & 
Parkinson's Diseases will 
take place in Barcelona, 
Spain, and focus on new 
possibilities for treating 
the conditions. 
go.nature.com/jegygu 


Climate inquiry 

An inquiry has exonerated 
climate scientists with 

the National Oceanic and 
Atmospheric Administration 
in Washington DC of data 
manipulation or unethical 
behaviour. Requested 

by Senator James Inhofe 
(Republican, Oklahoma), it is 
the latest of many investigations 
to clear researchers of 
implications of scientific 
misconduct in e-mails from the 
Climatic Research Unit at the 
University of East Anglia, UK, 
leaked in November 2009. Ina 
report released on 24 February, 
the inspector general of the US 
commerce department, who 
headed the inquiry, found no 
evidence of wrongdoing in the 
e-mails. 


> NATURE.COM 
For daily news updates see: 
Www.nature,com/news 
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revamp of National Institutes 


Unease over rapid 
of Health p.15 


editing makes the move 
into the clinic p.16 


Precise gene 


China 
counts the costof 


fast development p.19 


Native Americans 
change from subjects to 
investigators p.25 


C. MARCELLONI/CERN 


“Any squarks in here?” The ATLAS detector (above) at the Large Hadron Collider has failed to find predicted ‘super partners’ of fundamental particles. 


| PHYSICS | 


Beautiful theory collides 
with smashing particle data 


Latest results from the LHC are casting doubt on the theory of supersymmetry. 


BY GEOFF BRUMFIEL 


¢C C onderful, beautiful and unique” 
is how Gordon Kane describes 
supersymmetry theory. Kane, 


a theoretical physicist at the University of 
Michigan in Ann Arbor, has spent about 
30 years working on supersymmetry, a theory 
that he and many others believe solves a host 
of problems with our understanding of the 
subatomic world. 


Yet there is growing anxiety that the theory, 
however elegant it might be, is wrong. Data 
from the Large Hadron Collider (LHC), a 
27-kilometre proton smasher that straddles the 
French—Swiss border near Geneva, Switzerland, 
have shown no sign of the 
‘super particles’ that the 


theory predicts'*.“We're Read more at 
painting supersymmetry —_Nature’s LHC 
into a corner,’ says Chris special: 


Lester, a particle physicist 
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at the University of Cambridge, UK, who works 
with the LHC’s ATLAS detector. Along with the 
LHC’s Compact Muon Solenoid experiment, 
ATLAS has spent the past year hunting for 
super particles, and is now set to gather more 
data when the LHC begins a high-power run in 
the next few weeks. Ifthe detectors fail to find 
any super particles by the end of the year, the 
theory could be in serious trouble. 
Supersymmetry (known as SUSY and pro- 
nounced ‘Susie’) emerged in the 1970s as 
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> away to solve a major shortcoming of the 
standard model of particle physics, which 
describes the behaviour of the fundamental 
particles that make up normal matter (see 
“The bestiary’). Researchers have now found 
every particle predicted by the model, save 
one: the Higgs boson, theorized to help endow 
other particles with mass. 

The Higgs is crucial to the theory, but its 
predicted mass is subject to wild fluctuations 
caused by quantum effects from other fun- 
damental particles. Those fluctuations can 
increase the Higgs’ expected mass to a point 
at which other fundamental particles should 
be much more massive than they actually 
are, effectively breaking the standard model. 
Theorists can eliminate the fluctuations from 
their equations, but only by setting the Higgs 
mass to a very precise value — a fraction heav- 
ier or lighter and the whole theoretical edifice 
collapses. Many physicists are uncomfortable 
with any theory that requires such delicate 
fine-tuning to work. 

SUSY offers an alternative to this ‘fine- 
tuning’ problem. The theory postulates that 
each regular particle has a heavier supersym- 
metrical partner, many of which are unstable 
and rarely interact with normal matter. The 
quantum fluctuations of the supersymmetri- 
cal particles perfectly cancel out those of the 
regular particles, returning the Higgs boson to 
an acceptable mass range. 

Theorists have also discovered that SUSY 
can solve other problems. Some of the lightest 
supersymmetrical particles could be the elusive 
dark matter that cosmologists have been hunt- 
ing for since the 1930s. Although it has never 
been seen, dark matter makes up about 83% of 
the matter in the Universe, according to obser- 
vations of how galaxies move. SUSY can also 
be used to bring together all the forces except 
gravity into a single force at high energies, a 
big step towards a ‘theory of everything’ that 
unifies and explains all known physics — one 
of the ultimate goals of science. Perhaps most 
important for some theorists, “SUSY is very 
beautiful mathematically”, says Ben Allanach, 
a theorist at the University of Cambridge. 

SUSY’s utility and mathematical grace have 
instilled a “religious devotion” among its 
followers, says Adam Falkowski, a theorist at 
the University of Paris-South in France. But 
colliders have failed to turn up direct evidence 
of the super particles predicted by the theory. 
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THE BESTIARY 


Could shadowy super particles be lurking 
behind the standard model's observed 
fundamental particles and forces? 


Quarks Leptons Force carriers 
Squarks Sleptons Gauginos 


SUSY’S MID-LIFE CRISIS 


Several theorists independently 
develop SUSY 


1970-74 


Supersymmetric version of the 
standard model proposed 


1983 SUSY used to explain dark matter 


SUSY suggested as a way to unify 
electroweak and strong forces 


Large Electron Positron collider (the LHC's 
predecessor) fails to find evidence of SUSY 


} 2000 J particles called sleptons 
2008 Tevatron sets mass limits on 
supersymmetric quarks (squarks) 


2011 LHC tightens limits on SUSY masses 


The Tevatron at the Fermi National Accelera- 
tor Laboratory in Batavia, Illinois, for example, 
has found no evidence of supersymmetrical 
quarks (‘squarks’) at masses of up to 379 gigae- 
lectronvolts (energy and mass are used inter- 
changeably in the world of particle physics). 
The LHC is now rapidly accumulating data 
at higher energies, ruling out heavier territory 


for the super particles. This creates a serious 
problem for SUSY (see ‘SUSY’s mid-life crisis’). 
As the super particles increase in mass, they 
no longer perfectly cancel out the troubling 
quantum fluctuations that they were meant to 
correct. Theorists can still make SUSY work, 
but only by assuming very specific masses for 
the super particles — the kind of fine-tuning 
exercise that the theory was invented to avoid. 
As the LHC collects more data, SUSY will 
require increasingly intrusive tweaks to the 
masses of the particles. 

So far the LHC has doubled the mass limit 
set by the Tevatron, showing no evidence of 
squarks at energies up to about 700 gigaelec- 
tronvolts. By the end of the year, it will reach 
1,000 gigaelectronvolts — potentially ruling 
out some of the most favoured variations of 
supersymmetry theory. 

“Tm wouldn't say ’'m concerned,” says John 
Ellis, a theorist at CERN, Europe's particle- 
physics lab near Geneva, who has worked on 
supersymmetry for decades. He says that he will 
wait until the end of 2012 — once more runs 
at high energy have been completed — before 
abandoning SUSY. Falkowski, a long-time critic 
of the theory, thinks that the lack of detections 
already suggest that SUSY is dead. 

“Privately, a lot of people think that the 
situation is not good for SUSY,’ says Alessandro 
Strumia, a theorist at the University of Pisa in 
Italy, who recently produced a paper about the 
impact of the LHC’s latest results on the fine- 
tuning problem’. “This is a big political issue in 
our field,” he adds. “For some great physicists, it 
is the difference between getting a Nobel prize 
and admitting they spent their lives on the 
wrong track” Ellis agrees: “I've been working 
on it for almost 30 years now, and I can imagine 
that some people might get a little bit nervous.” 

“Plenty of things will change if we fail to dis- 
cover SUSY,” says Lester. Theoretical physicists 
will have to go back to the drawing board and 
find an alternative way to solve the problems 
with the standard model. That's not necessarily 
a bad thing, he adds: “For particle physics as a 
whole it will be really exciting.” m SEE EDITORIALP.6 


1. ATLAS Collaboration. Preprint at http://arxiv.org/ 
abs/1102.2357 (2011). 

2. CMS Collaboration. Preprint at http://arxiv.org/ 
abs/1101.1628 (2011). 

3. ATLAS Collaboration. Preprint at http://arxiv.org/ 
abs/1102.5290 (2011). 

4. Strumia, A. Preprint at http://arxiv.org/ 
abs/1101.2195 (2011). 
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NIH revamp rushes ahead 


Translational-science centre remains on the fast track, despite concerns about upheaval. 


BY MEREDITH WADMAN 


eremy Berg was fighting rush-hour traffic 

on his way home from the US National 

Institutes of Health (NIH) in Bethesda, 
Maryland, on 8 February when he took an 
unexpected call. On the line was a senior 
NIH official who was helping to plan the dis- 
mantling of the agency’s National Center for 
Research Resources (NCRR) to make way for a 
translational-medicine centre strongly backed 
by Francis Collins, director of the NIH. 

The caller asked Berg, who is head of the 
US$2-billion National Institute of General 
Medical Sciences (NIGMS) at the NIH, to 
consider whether his institute could absorb the 
Institutional Development Award (IDeA), an 
NCRR programme that builds research infra- 
stucture in states with historically limited 
success at winning NIH grants. He wanted an 
answer by the following day. 

“I was given approximately 24 hours to 
decide whether NIGMS should take on a large 
(>$200M), complicated program not closely 
related to our core mission,’ Berg wrote on 
22 February in an open letter to the Scientific 
Management Review Board, which advises 
Collins on structural changes at the NIH. Berg 
agreed to absorb the programme, but “with 
very little comfort that this was a sound deci- 
sion’ He went on to urge the board to fight the 
hasty dissolution of the NCRR. 

Berg’s complaint is one of a deluge facing 
Collins and his staff as they rush to launch 


CHANGE AT THE US NATIONAL INSTITUTES OF HEALTH 


The proposed dissolution of the National Center for Research Resources 
and creation of the National Center for Advancing Translational Sciences 
will lead to some programme relocations (figures are for fiscal year 2010). 


AS OF MARCH 2011 


the National Center for Advancing Transla- 
tional Sciences (NCATS) by the start of the US 
government's 2012 budget year on 1 October. 
Most critics do not disagree with the reasoning 
for the proposed centre — Collins wants the 
NIH to become more strategically engaged in 
turning promising compounds into clinically 
approved drugs, a process that often stalls for 
lack of resources and know-how. Rather, many 
are challenging the speed at which NCATS is 
being established — and the even greater speed 
with which Collins decided in December to 
dissolve the NCRR, transfer a significant por- 

tion of it to the new 


Critics fear that centre and scatter 
the changes the rest across the 
will put at risk NIH (see graphic). It 
programmes will be the first such 
that they say break-up in the NIH’s 
are working 81 years. 

extremely well. Sixteen US sena- 


tors wrote to Col- 
lins on 14 February, 
supporting IDeA and urging him to slow the 
pace of the reorganization to gauge its impact. 
Other critics fear that the changes will put 
at risk NCRR programmes that they say are 
working extremely well. “Why are we fixing 
what isn’t broken?” asks Brad Bolon, director 
of GEMpath, a biopharmaceutical consultancy 
in Longmont, Colorado. (see page 36). 
Collins, who has made translational 
research one of five priorities for his tenure 
at the NIH, sees NCATS as removing the risk 


from early-stage therapeutic compounds by 
bringing them through the first phases of drug 
development, to the point at which companies 
are willing to license them. The centre would 
consolidate several existing NIH projects — 
most prominently the Clinical and Transla- 
tional Science Awards, which at $490 million 
in 2010, is the largest NCRR programme. And 
if congressional spending committees agree to 
a White House request for a 10% budget boost 
for the NIH Office of the Director, which is 
responsible for organizing programmes across 
the agency, Collins plans to channel $100 mil- 
lion of that money to NCATS to fund the Cures 
Acceleration Network, a grant programme 
supporting ‘high need’ drug-development 
projects. 

Collins’ decision to push ahead quickly with 
NCATS means that the rest of the $1.3-billion 
NCRR — which supports a diverse collection 
of infrastructure and training programmes, 
from primate-research centres to high-end 
instrumentation grants — cannot simply be 
left intact. This is because of a 2006 law that 
caps the number of NIH institutes and centres 
at 27; the dissolution of the NCRR creates the 
needed opening for NCATS. Lawrence Tabak, 
principal deputy director of the NIH and co- 
chairman of the NCRR Task Force, a working 
group deciding what to do with the remain- 
ing pieces of the centre, says that, “given the 
opportunity to think this through’, the group 
had decided that the remaining NCRR pro- 
grammes would thrive better if strategically > 


BY OCTOBER 2011 


National Center for Research Resouces 
Clinical and Translational Science Awards (US$490 million) 
Shared and High-End Instrumentation ($65 million) 
Division of Comparative Medicine ($197 million) 
Extramural Construction ($1 billion, 2009-10) 
Institutional Development Award ($229 million) 
Research Centers in Minority Institutions ($59 million) 
Biomedical Technology Research Resources ($150 million) 


National Center for Advancing Translational Sciences 
Clinical and Translational Science Awards 
Therapeutics for Rare and Neglected Diseases 
Cures Acceleration Network ($100 million)* 
Rapid Access to Intervention Development 
Molecular Libraries Program 


Office of the Director: Infrastructure Entity 
Shared and High-End Instrumentation 
Division of Comparative Medicine 


Extramural Construction 


> 


Office of the Director 


Rapid Access to Intervention Development ($5.8 million) 
Molecular Libraries Program ($113 million) 


National Institute of Biomedical Imaging and Bioengineering 
a Imaging and Point-of-Care Biomedical Technology Research Centers grants 
Imaging and Point-of-Care research grants for Technology Research and Development 


National Human Genome Reseach Institute 
Therapeutics for Rare and Neglected Diseases ($24 million) 


National Institute of General Medical Sciences 
Institutional Development Award 
All other Biomedical Technology Research Centers grants 
All other research grants for Technology Research and Development 


* Proposed by President Barack Obama for fiscal year 2012 
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National Institute on Minority Health and Health Disparities 
Research Centers in Minority Institutions 
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> placed in other NIH institutes and in the 
Office of the Director. 

Many constituents of the NCRR fear for the 
futures of their programmes in institutes that 
didn’t sign up for them and may not share the 
NCRR’s commitment. “Dr Tabak and Francis 
Collins say this is going to be budget neutral,” 
says one member of the NCRR’s external advi- 
sory council. “But when you take a programme 
from one institute and hand it to another, per- 
haps without their agreement, you know that 
within five years or so that orphan programme 
could be budgeted out of existence.” 

That concern was especially evident in 
mid-January, when Tabak’s group proposed a 
‘straw’ model — designed to generate discus- 
sion — for the dissolution of the NCRR that 
showed much of its portfolio in an ‘interim 
infrastructure unit. Some critics were mollified 
when Tabak issued a revised plan last week, 
calling the infrastructure 


entity permanent and NATURE.COM 
placing it in the Office — For more on Francis 
of the Director. The  Collins’splans for 
latest plan includes _ theNIH, visit: 
other adjustments: for — go.nature.com/guzqcy 


example, the straw model had divided the 
NCRR’s primate and non-primate animal- 
model resources, but the revised model keeps 
them together under the director's office. 


Stuart Zola’s research centre is slated to become 
the responsibility of the NIH Office of the Director. 


Stuart Zola, director of the Yerkes National 
Primate Research Center in Atlanta, Geor- 
gia, which is currently funded by the NCRR, 
is one of those whose fears were soothed by 
the adjustments. “Given that we were going 
to be moved, it makes sense to be moved into 
another broad-based environment” rather 
than a disease-specific institute, he says. 

“The willingness to listen to the stakeholders 
is very clearly evident in the new document,’ 
says William Talman, president of the Federa- 
tion of American Societies for Experimental 
Biology in Bethesda, Maryland, who praises 
Collins for making a “bold stroke” in launching 
NCATS. Still, he says: “I don’t think I will be 
comfortable until the test of time determines 
exactly what the outcome is.” 

Those seeking to challenge the dismantling 
of the NCRR will have another opportunity to 
voice concern at a meeting for stakeholders on 
14 March. However, the window of opportu- 
nity to stop the process is narrowing. Collins 
plans to deliver a detailed budget for the new 
centre to Congress in the coming weeks, and 
last week he told reporters that he is preparing 
to search for the future director of NCATS. 


GENE THERAPY 


Targeted gene editing 
enters clinic 


Patients with HIV first to receive experimental gene therapy. 


BY HEIDI LEDFORD 


gene-therapy method that specifically 
Ai a single gene may have had its 

first success in the clinic, potentially 
boosting immune-cell counts in a small 
number of patients with HIV. The results, 
presented on 28 February at the Conference 
on Retroviruses and Opportunistic Infections 
in Boston, Massachusetts, mark an important 
therapeutic test for enzymes known as zinc 
finger nucleases — small proteins that can 
be designed to bind to and edit specific DNA 
sequences by virtue of their zinc-bearing 
structures. 

The study, a phase I safety trial, tested a zinc 
finger enzyme developed by Sangamo Bio- 
Sciences in Richmond, California. It included 
six men with HIV who were already taking 
the standard regimen of antiretroviral drugs. 
The drugs had kept the virus at bay, but their 
immune-cell counts remained abnormally 
low. Researchers removed a sample of CD4* 
T cells, the type of immune cells affected by 
HIV, from each man and used Sangamo’s 
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enzyme to disrupt the CCR5 gene, which 
encodes a protein that HIV uses to enter 
CD4' cells. The engineered cells were then 
infused back into the patients. Immune-cell 
counts subsequently rose for five of the six 
patients who received the therapy. 

“Tt’s very exciting,’ says John Rossi, a molec- 
ular biologist at the City of Hope’s Beckman 
Research Institute in Duarte, California. “If 
they did this several times in a given patient, 
you could establish a high percentage of 
resistant cells.” 

The inspiration for targeting the CCR5 
gene comes from the small percentage of 
people who, thanks to a natural mutation in 
the gene, are resistant to most types of HIV 
infection. At the meeting on Monday, Jacob 
Lalezari of Quest Clinical Research in San 
Francisco, California, reported that the engi- 
neered cells migrated throughout the body 
and thrived in the gut mucosa — a key reser- 
voir of HIV. No serious side effects were seen. 

The zinc finger nuclease technique is 
promising for the treatment of many dis- 
eases beyond HIV, says Patrick Aubourg, 
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who studies gene therapy at France's national 
biomedical agency INSERM in Paris. The 
method could replace the more common 
technique of inserting modified genes into 
the genome, in which researchers have less 
control over the gene in question. But he 
cautions that the technique still has a rela- 
tively low efficiency and might have off- 

target effects. 
Meanwhile, Rossi, who is himself embark- 
ing on an HIV study that will use Sangamo’s 
Zinc finger nucleases, 


“If they did this says that it is not yet 
several times in clear whether the 
agiven patient, patients’ CD4" cell 
you could count rose because 
establishahigh °F the CCRS dis- 
percentage of ruption or because 


the extracted cells 
were activated as 
part of the proto- 
col for growing them outside the body. And 
because levels of HIV were already below the 
threshold of detection in these patients, it is 
too early to say what effect the therapy could 
have on patients that have more of the virus. 
Researchers do not yet know what fraction of 
a person’s CD4' cells would need to be HIV- 
resistant to significantly rein in the virus’s 
spread and liberate patients from a lifetime 
of antiretroviral drugs. 

“It's going to take a while to put all of those 
pieces together,’ says Carl June, who studies 
T cells at the University of Pennsylvania in 
Philadelphia, and is an investigator on another 
HIV trial involving Sangamo’s nuclease. “But 
it’s at least conceivable now. m 


resistant cells.” 
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Traditional drug-discovery 
model ripe for reform 


Academic researchers set to play much greater role in pharmaceutical development. 


BY DANIEL CRESSEY 


ith drug pipelines running dry and 
a slew of blockbuster medicines 
about to lose patent protection, 


the voices arguing that the traditional drug- 
development process is too expensive and 
inefficient to survive are getting louder. 

Employing thousands of in-house scientists 
to develop drug candidates from scratch has 
turned into a billion-dollar gamble that simply 
isn't delivering enough profitable products to 
market. Bernard Munos, founder of the Inno- 
Think pharmaceutical policy research group in 
Indianapolis, Indiana, is not alone in believing 
that the next three years “will probably see an 
implosion of the old model” of drug discovery. 

So what comes next? Cutbacks, certainly: 
witness Pfizer's dramatic announcement early 
last month that it will soon close its research 
site at Sandwich, UK, and slice roughly 
US$1.5 billion from its proposed 2012 research 
and development spend (see Nature 470, 154; 
2011). But beyond that, perhaps, a rethink of 
the old divisions of labour is needed. 

Canny drug-makers are listening to those 
who propose that they should increasingly 
outsource early-stage drug development, 
including phase I safety trials, to academia 
or to small, specialist companies. This would 
leave pharmaceutical companies to focus on 
their strengths: running large clinical trials and 
marketing medicines. 


One such model «py, 
leavy 
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industry, academia 
and funding agen- 
cies. The meeting was co-convened by Chas 
Bountra, head of the Structural Genomics 
Consortium at the University of Oxford, UK, 
who argues that a key problem with the cur- 
rent system is that companies tend to work in 
parallel, identifying similar or identical target 
molecules while remaining unaware that the 
compounds may have already been tested and 
discarded by rivals. “What we're trying to do is 
reduce that duplication,” he says. 

His scheme adopts the highly collaborative 


The kit may have improved, but the in-house drug discovery model has changed relatively little. 


approach pioneered by those working on cures 
for neglected diseases, in which intellectual 
property (IP) restrictions are lifted. Compa- 
nies would begin to compete only after early 
clinical trials had shown a drug to be safe and 
potentially effective. Up to that point, all data 
on prospective drug candidates would be pub- 
lished openly. This would allow targets to be 
validated much more quickly, says Bountra, 
potentially giving enormous savings in cost. 
It would also prevent companies “exposing 
patients to molecules that other organizations 
already know are going to be ineffective”. 

The model would rely heavily on academic 
scientists supported by a global initiative cost- 
ing about $325 million a year, with half com- 
ing from the pharmaceutical industry and half 
from public and charitable sources. Successful 
drug candidates would be made available for 
the initiative’s commercial sponsors to buy and 
bring to market. 

Industry already believes that this is a fine 
solution for programmes in areas without 
major commercial interest, such as neglected 
diseases, says Stephen 


Friend, an organizer NATURE.COM 
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in Seattle, Washington. The key difference in 
the Toronto proposal is that it may also be a 
“viable way to improve return on investment in 
commercially important areas’, he says. 

Bountra is confident that within the next 
two months he will complete negotiations 
to sign up two industry partners, two public 
funding partners and two academic partners. 
The response at the meeting, to which all the 
large drug companies sent representatives, was 
very positive from all involved, say attendees. 
“The more we discussed it, the more convinced 
we were that this is the only way forward,” 
says Bountra. A follow-up meeting in San 
Francisco in April will flesh out the plans. 

Meanwhile, government funders of research 
are trying out similar initiatives. The UK 
Medical Research Council has established the 
Developmental Pathway Funding Scheme 
to support the development of basic science 
into drugs and medical devices. And Francis 
Collins, head of the US National Institutes of 
Health, is proposing a National Center for 
Advancing Translational Sciences to push 
more basic science towards the medical 
market (see page 15). 

Ted Bianco, director of technology trans- 
fer for the Wellcome Trust, a UK biomedical 
research funder, agrees that shifting early-stage 
drug discovery work to academia could fix 
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some of pharma’ problems. The trust's Seed- 
ing Drug Discovery initiative already funds 
researchers to optimize drug candidates 
and take them through to clinical trials. But 
Bianco points out that commercial partners 
in Bountra’s initiative would expect to see a 
financial return: “The dilemma is that heavy 
investment is required, and it has to be car- 
ried by somebody’s money.” 

Bountra’s IP-free model could also 
deprive collaborating universities of the 
opportunity to profit from spin-out com- 
panies, says Melanie Lee, chief executive 
of Syntaxin, a biotech company based in 
Oxford, UK, who attended the Toronto 
meeting. Bountra says he doubts that will 
discourage academics, who get into drug 
discovery to develop medicines, not to 
acquire intellectual property. 

But Patrick Vallance, senior vice-pres- 
ident for medicines development and 
discovery at London-based drug-makers 
GlaxoSmithKline (GSK), also believes 
that IP will be the most contentious part of 
Bountra’s model. “I'm completely on board 
with the idea you dont really know if you're 
on track until you've done an experiment in 
the clinic, and that you should publish that 
early,’ he says. But “it's much more complex 
to determine where you need to secure the 
IP along that chain, and I think it will differ 
from molecule to molecule”. 

Nevertheless, his company is experi- 
menting with open innovation, having last 
year put more than 13,000 potential anti- 
malarial drug structures into the public 
domain to encourage academics to iden- 
tify promising leads. “One of the reasons I 
want to push that model very hard is that if 
it works in malaria — and we've yet to see 
what the uptake from academics and others 
is — I don't see howyou could do anything 
but pursue it in other areas,” says Vallance. 

Vallance notes that industry is also 
developing new models of academic collabo- 
ration. GSK announced this month that it 
will collaborate with Mark Pepys, head of 
medicine at the Royal Free and University 
College Medical School in London, to 
develop a candidate drug for amyloidosis, a 
protein disorder. The idea, says Vallance, is 
not just to buy up promising molecules, but 
to form long-term partnerships that last all 
the way through drug development. 

All these models put academic research- 
ers at the heart of drug discovery. But they 
will fail unless more money flows from 
governments, charities and industry into 
academic labs, says Cathy Tralau-Stewart, 
who heads Imperial College London’s drug 
discovery research unit. “Academic drug 
discovery is growing and is becoming much 
more important,’ she says, “but if we don't 
solve the funding issues, then the pharma 
companies will not have a pipeline of 
innovative drugs in ten years’ time.” = 
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RECESSION WATCH 


Budget woes sink 
marine archive 


Oceanographic library could be a casualty of California’s 


$25 -billion deficit. 
BY ERIKA CHECK HAYDEN 


he fiscal crisis at the University of 
T California looks set to engulf the world’s 

largest collection of research materials 
focused on marine sciences. 

On 11 February, Brian Schottlaender, 
librarian at the University of California, San 
Diego (UCSD), proposed closing the Scripps 
Institution of Oceanography Library, along 
with four other libraries affiliated with UCSD, 
including the Medical Center Library and the 
Science & Engineering Library. 

Schottlaender had been asked to cut 
US$6 million out of his $25-million budget as 
part of a $500-million reduction for the entire 
University of California 
system. Newly elected 
state governor Jerry 
Brown, who faces a 
$25-billion state deficit, 
announced the reduc- 
tion in January. The 
Scripps library closure 
is the highest-profile 


SHRINKING POOL 


casualty of the cuts so o 
far, but it is unlikely to = 
be the last. e 

The library includes 8 
some 227,000 books s 
and 700 print peri- 3 


odicals along with an 
extensive archive that 
charts the history of 
oceanography, includ- 
ing documents from 
the 1872-76 voyage of 
HMS Challenger — a 
landmark global oceanographic expedition. 
News of the planned closure has elicited a 
storm of protest. “Closing the Scripps library 
is almost unthinkable,” says Walter Munk, 
a pioneering oceanographer who spent his 
entire professional career at the Scripps insti- 
tution. “The Scripps library is a unique asset 
to the community of oceanographers every- 
where.” A group of Scripps graduate students 
has organized a petition opposing the closure. 

Select collections and services from the 
library could be moved to a larger library 
on UCSD’s main campus as early as this 
summer, according to Schottlaender. And 
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The total libraries budget for the ten 
University of California campuses 
has dropped sharply since 2008. 


2005 2006 2007 2008 2009 2010 
/08 


Peter Brueggeman, director of the Scripps 
library, notes that about half of the library’s 
collection has been digitized through a part- 
nership with Google. But, he says, “the real- 
ity is that many research-oriented library 
resources are not yet digitized, are not freely 
available or are not affordable at this time”. 

Schottlaender points out that, before this 
year’s proposed cut, his budget had already 
been reduced by 16% since 2008 and that the 
Scripps library, with 34,000 visitors last year, is 
notas heavily used as other libraries on campus. 
“My hands are more are less tied. Everything 
is getting cut everywhere,’ Schottlaender says. 

Other libraries are feeling the pinch too (see 
‘Shrinking pool’). At the University of Cali- 
fornia, San Francisco, 
librarian Karen Butter 
says that she doesn’t 
have the budget to sub- 
scribe to some data- 
bases that researchers 
want, such as BIOBASE, 
which contains prod- 
ucts such as the Human 
Gene Mutation Data- 
base, an archive of 
mutations associated 
with disease. She adds 
that the university is 
negotiating with pub- 
lishers to lower the cost 
of online access to indi- 
vidual journals, because 
packages of journals 
are no longer afford- 
able. On 1 January, the 
University of California 
library system cancelled its site licence to the 
Informa Healthcare journals — the first time 
the university has cancelled a subscription to 
a ‘bundle’ of journals. 

At the University of California, Santa Cruz, 
hundreds of undergraduate and graduate stu- 
dents occupied its science and engineering 
library in 2009 and 2010 to protest over cuts 
in library hours. In May, the students voted 
to institute a $6.50 library fee per student per 
quarter to pay to keep the library open. But the 
fee ends in 2013. At that point, says university 
librarian Virginia Steele, “we're facing a really 
difficult dilemma’ = 
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China faces up to ‘terrible 
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State of its ecosystems 


Wetlands hardest hit by land reclamation and pollution. 


BY JANE QIU IN BEIJING 


( ved the cost of decades of break- 
neck development, Chinese scientists 
and policy-makers last week outlined 

the daunting challenges they face in trying to 

halt the country’s environmental degradation. 

Government officials at the Symposium 
on Ecosystem Monitoring and Evaluation 
in Beijing promised to step up investment in 
ecological conservation and restoration over 
the next five years, although no precise details 
were given. Other delegates warned that the 
lack of a national long-term strategic plan for 
the environment, compounded by insufficient 
coordination among government sectors, 
could jeopardize such efforts. 

“The ecological situation is terrible; admits 
Xu Jun of the Ministry of Science and Technol- 
ogy. More than a quarter of China’s grasslands, 
for instance, have been lost to farming and 
mining activities in the past decade, and 90% 
of the country’s remaining 4 million square 
kilometres of grassland is in poor health. The 
grassland loss contributes to problems such as 
water shortages and sandstorms. 

Coastal areas are under even greater pres- 
sure — from pollution, drainage and devel- 
opment. “Of all ecosystems, wetlands are the 
worst hit,’ says Yu Xiubo, an ecologist at the 


WETLAND THREATS 


China's most endangered ecosystems are being 
degraded by factors linked to economic development. 
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Beijing-based Institute of Geographic Sciences 
and Natural Resources Research, part of the 
Chinese Academy of Sciences (CAS). 

A recent report by the China Council for 
International Cooperation on Environment 
and Development (CCICED), a joint Chi- 
nese and international advisory board to the 
government, shows that 57% of the country’s 
coastal wetlands have disappeared since the 
1950s, largely due to land reclamation (see 
‘Wetland threats’). Over the same period, the 
area covered by mangrove forests and coral 
reefs fell by 73% and 80%, respectively. 

On the basis of development projects 
approved by the government, the authors of 
the CCICED report estimate that another 
5,800 square kilometres of coastal area will be 
lost by 2020, eating away at the total 385,000 
square kilometres of remaining wetlands. 


RECOVERY EFFORTS 

China has notignored the problem. The forestry 
ministry has been mapping wetlands nation- 
wide, and 2,538 nature reserves have been 
established covering about 15% of the coun- 
try’s total area, including half of the natural 
wetland ecosystems, according to Cui Lijuan, 
director of the Institute of Wetland Research 
in Beijing. However, nature reserves are often 
poorly protected from development. 

Over the past five years, the science ministry 
has spent 500 million renminbi (US$76 mil- 
lion) on the monitoring, evaluation and 
restoration of key ecosystems, says Xu. He says 
that funding will increase significantly, and will 
include a new focus on assessing the impact 
of pollution on public health. In collabora- 
tion with the CAS, the environment ministry 
will spend the next two years conducting a 
national ecological survey, following up on 
a survey done in 2000. Among the survey’s 
goals are an assessment of the services pro- 
vided by key ecosystems, and the impact of 
major engineering projects, including the 
Three Gorges Dam in central China. 

According to Zhong Xianghao, an ecolo- 
gist at the CAS Institute of Mountain Hazards 
and Environment in Chengdu, monitor- 
ing and restoring the 
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Goodbye mangroves, hello rice fields — but at 
what cost to China’s ecological health? 


15.5 billion renminbi between 2008 and 2015 
for conservation projects and to create a moni- 
toring network of ten ecological stations in the 
region. An additional 13.4 billion renminbi 
per year will be paid to farmers and nomadic 
peoples to conserve grassland in parts of west- 
ern China, says Yang Zhi of the agriculture 
ministry. 

Yet China will struggle to preserve its 
remaining intact ecosystems (see “China's 
resources ) in the face of the growing demand 
for land. This is being driven by population 
growth and by the government’s plan to quad- 
ruple the country’s gross domestic product 
between 2000 and 2020. 

And some delegates at the symposium used 
the Chinese saying jiulong zhishui, mean- 
ing ‘taming the water with nine dragons; to 
describe the overlapping monitoring efforts 
of various government ministries. These 
efforts are all too often short term and unco- 
ordinated, says Cui, when “it takes decades to 
get a good idea of the baseline and changes of 
ecosystems”. @ 
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— Study of a———_— 


LIFETIME 


In 1946, scientists started tracking thousands of British children 
born during one cold March week. On their 65th birthday, the study 
members find themselves more scientifically valuable then ever before. 


BY HELEN PEARSON 


n Tuesday 5 March 1946, Patricia Malvern was born in 
a small flat in Cheltenham, UK, near the boilers that her 
dad stoked to warm the building above. She weighed in at 
9 pounds, 2 ounces (4 kilograms). 

The next day, David Ward was “one of the few Catholics born ina 
Jewish hospital” opposite Hampton Court, near London. Ward doesn’t 
know exactly what he weighed, although his dad said later that he 
looked “like a skinned rabbit”. 

Throughout the rest of that week, just months after the end of the 
Second World War, 16,695 babies were born in England, Scotland and 
Wales. Health visitors carefully recorded the weights of the vast major- 
ity on a four-page questionnaire, along with countless other details 
including the father’s occupation, the number of rooms and occupants 
(including domestics) in the baby’s home and whether the baby was 
legitimate or illegitimate. Over subsequent years, the information files 
on more than 5,000 of these children thickened, then bulged. Through- 
out their school years and young adulthood and on into middle age, 
researchers weighed, measured, prodded, scanned and quizzed the 
group’ bodies and minds in almost every way imaginable. 

This week, the group has much to celebrate. They are turning 65, 
the age at which many in the United Kingdom retire and, as such, 
a milestone in British life. They will also celebrate being part of the 
longest-running birth-cohort study in the world. These ordinary men 
and women are now some of the best-studied people on the planet. 
And this makes them some of the most scientifically valuable, because 
it has allowed researchers to track their health and wealth throughout 
their lives, and to search for factors that could explain their trajectories. 

The exercise has revealed some surprises. It has shown that the heavi- 
est babies were most at risk of breast cancer decades later; that children 
born into lower social classes were more likely to gain weight as adults; 
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that women with higher IQ reached menopause later in life; and that 
young children who spent more than a week in hospital were more 
likely to suffer behaviour and education problems later on. 


A generation under study 


All told, the results from the 1946 birth cohort — now known as the 
National Survey of Health and Development and run by the Medical 
Research Council (MRC) — have filled 8 books and some 600 papers 
so far. Perhaps more than anything else, the survey has shown that 
early life matters — a lot. “Ultimately, where you get to in early adult- 
hood is strongly influenced by where you come from,” says Michael 
Wadsworth, who led the study for nearly 30 years, until 2007. 
Children who were born into better socioeconomic circumstances 
were most likely to do well in school and university, escape heart dis- 
ease, stay slim, fit and mentally sharp and, so far at least, to survive. 
(Ward, whose father worked his way up in a Walthamstow-based dry- 
cleaning business, went on to university and built a career in journal- 
ism. Malvern, whose father left home when she was five and who wore 
third-hand clothes, left school at 16 and “bitterly regrets” the fact that 
her mother couldn't afford to pay tuition for her to train as a teacher.) 
Those lessons are arguably more urgent today 


> NATURE.COM than they were in 1946 when, caught up in post- 
Listentoapodcast war optimism, Britain was introducing major 
about the 1946 birth- educational reforms and a National Health Ser- 
cohort study at: vice (NHS) to ensure that good schooling and 
go.nature.com/7rhmk3 = health were available to all. The contrast with the 
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Diana Kuh leads the UK National Survey of Health and Development, which has compiled thick files on more than 5,000 people since their birth in 1946. 


country’s mood this winter couldn't be starker. Students have been 
rioting to protest against the government’ plan to introduce £9,000 
(US$14,600) annual fees for universities; plans are afoot to drastically 
reform the NHS (eviscerate it, critics say); and sweeping budget cuts 
are threatening public services — including early childhood support 
centres, for which the cohort’ data once helped provide impetus. “I find 
these changes very worrying,’ says Diana Kuh, who now directs the sur- 
vey and says she is saving up for her grandchildren to attend university. 

“It's unique and groundbreaking in the history of epidemiology. It’s the 
only study to have chased an entire cohort across its life course — and 
it's not yet finished,” says Ezra Susser, an epidemiologist who works with 
cohort studies at Columbia University in New York. He says that cohort 
research has been vital in seeding the idea that disease evolves as a result 
of events throughout life. “You gain enormous depth of understanding in 
how that disease came to be by following someone over their life course? 

Now, as the cohort members enter old age, the study offers a precious 
opportunity to understand how a lifetime of experiences might hasten 
or slow their decline — an urgent question for countries such as the 
United Kingdom and United States, whose populations are rapidly age- 
ing and sickening. In the latest round of data collection, running from 
2006 to 2010 and costing £2.7 million, study members underwent almost 
every modern biomedical test, including echocardiograms, measures 
of blood-vessel function, whole-body bone, muscle and fat scans, and 
tests of blood, memory and how quickly they could get up from a chair. 

The data will provide a detailed starting point from which to measure 
the cohort members’ inevitable decline, and the opportunity to analyse 
the information is already swelling an extensive network of collabora- 
tors. Some are testing how genes interact with a lifetime of experiences 
to lead to obesity or disease; others plan to scan participants’ genomes 
for ‘epigenetic marks — molecular traces left, perhaps, by early birth 


weight or by life’s inequalities — that alter gene expression and might 
provide a molecular explanation for effects in later life. Greg Duncan, 
an economist at the University of California, Irvine, who studies the 
impact of child poverty, hopes that follow-up studies could help to 
answer a question arising from the earlier findings on socioeconomic 
status and health: “What are the active ingredients in social class?” 

It is this ability to draw associations between biological data, from 
blood pressure right down to genes, and life as it is actually lived that 
makes the cohort study so unusual, say its leaders. “These are real 
people,” says Kuh. “This is what it is to be human and normal” 


Next Steps in Making Motherhood Easier 


The first few decades of the twentieth century found Britain acutely 
concerned about its falling birth rate and stagnant infant mortality. 
(The thought at the time, as Kuh puts it, was “how are we going to main- 
tain Britain and its empire?”) A Population Investigation Committee 
recommended a maternity survey to explore whether the social and 
economic costs of childbearing were discouraging prospective parents. 
James Douglas was appointed to head it. 

Douglas, a physician, had spent part of the war conducting vast stud- 
ies of air-raid casualties. He set about launching an investigation that 
today would be ethically difficult, logistically nightmarish and finan- 
cially prohibitive: sending health visitors to interview the mothers of 
every child born in that March week. He reached 13,687 of them. “It was 
crazily ambitious,” says Wadsworth, who inherited the study leadership 
from Douglas more than three decades later. Yet “he pulled it off” 
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BRITAIN’S SQUANDERED 
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In 1948, when Douglas’s book about the study’s results appeared, 
the baby boom was in full swing and concerns about birth rate had 
mostly dissipated. But the volume, Maternity in Great Britain, made a 
stir by revealing shocking disparities between rich and poor in infant 
survival and women’s care. One widely reported result showing that 
only 20% of women who gave birth at home were offered pain relief, 
and that the poor suffered most, spurred a parliamentary bill allowing 
more midwives to deliver gas and air. 

Douglas decided to turn the study into a tool for documenting social 
inequality and gauging the impact of newly 
minted welfare reforms such as the NHS. In 
particular, he realized that he had the per- 
fect weapon for testing the success of the 
1944 Education Act, which had introduced 
a nationwide system of exams for 11-year- 
olds — the 11+ — intended to channel the 
brightest, regardless of background, into elite 
‘grammar’ schools. He selected a sample of 
the original 13,687 children spanning geog- 
raphy and social class, ending up with 5,362, 
whose health, growth and other data were 
regularly recorded and then transferred onto 
punch cards. Douglas also tested the chil- 
dren’s cognition as they reached 8, 11 and 
15, and tracked their course through school. 


“TREASURY OF TALENT 


To the architects of the welfare state, the results 
were discouraging. Bright children from the 
middle classes were more likely to pass the 
11+ and do well at school than were equally 
bright working-class children, although sup- 
portive parents and good teachers could better 
achild’s odds. The attrition of smart but poor 
boys (girls counted for less) became known 
as the ‘waste of talent, turning Douglass next 
two books — The Home and the School (1964) 
and All Our Future (1968) — into must-read 
educational references and contributing to the 
introduction of non-selective ‘comprehensive 
schools in the 1960s. 

While Douglas was studying the group’s 
diverging paths, the children were walking 
them. Malvern, who was cripplingly embar- 
rassed by taking free school meals, failed her 
11+. She blames a class teacher so violent that 
Malvern would sleep without covers in order 
to catch a cold and avoid school, and who 
“walloped me across the head” on the day of 
the exam. After she left school, Malvern went 
to learn typing at Government Communica- 
tions Headquarters in Cheltenham. Ward's 
father, meanwhile, was planning to buy a house, and his mother tested 
him on Latin vocabulary over the ironing. He was one of 4 children out 
of 66 in his school’s top two classes who passed the 11+ exam, and he 
and his sister were the first in their family to attend university. 

As the 1970s rolled on and the participants entered their thirties, 
Douglas was losing steam. Most of his questions about the cohort mem- 
bers’ education, occupations and social mobility had been answered, 
and Douglas was heading towards retirement. Medical epidemiologists 
thought that the cohort should be mothballed until its members got 
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David Ward as a baby in 1947 with his mother and 
sister; and in 1976 with his son and daughter. 


interesting again, when they started to sicken and die. The MRC, which 
had been funding the project since 1962, dithered about what to do 
with it; even Douglas thought the project was finished. 


_ LIFE’S PATTERN DECIDED- 
AT THE AGE OF SEVEN | 


For Wadsworth, a social epidemiologist who 
had joined Douglas’s team in 1968, it was 
just getting going. “I thought the changing 
pattern of health of these people would be 
interesting over life,” he says. 

After he took the helm in 1979, Wadsworth 
convinced the MRC to fund a new round of 
data collection as the cohort reached 36, 
then again at 43 and 53. He started assessing 
the group’s physical capabilities and health, 
including blood pressure, heart and lung func- 
tion, diet and exercise. He wanted to see how 
these indicators had been influenced by earlier 
life — and then chart them into the future. 

Correlations tumbled out of the data. In 
1985, Wadsworth and his team reported 
that cohort members whose birth weight 
had been low had higher blood pressure as 
adults’. It was an early hint that fetal and 
infant growth shape adult health, a link that 
became known as the Barker hypothesis after 
David Barker, an epidemiologist at the Uni- 
versity of Southampton, UK, who published 
a 1989 analysis of birth weight and health in 
a different cohort’. He found that babies with 
the lowest birth weights had the highest risk 
of heart disease as adults. 

Study after study from the 1946 cohort 
supported the link, showing a tangle of con- 
nections between infant and child growth or 
development and adult traits from cognitive 
ability to frailty, diabetes, obesity, cancer and 
schizophrenia risk. “It isn’t the same story 
every time, but we find an endless stream of 
long-term associations in quite ‘noisy’ data,” 
says Kuh. “Big babies were more likely to get 
breast cancer. Small babies were more likely to 
have poor grip strength. Those who grew fast 
postnatally have more cardiovascular risk” 
(Says Ward: “I find that quite extraordinary, 
almost in a poetic way, that there is something 
that spans all those years, that something was 
set down, determined at that stage.) 

A major question for scientists today is how 
to explain these connections: which biological 
systems in infants are so important, and how 
are lasting scars laid down on them? One possible answer lies in epigenet- 
ics: the chemical footprints, such as methyl groups, stamped on DNA by 
early life events that alter gene-expression patterns and might contribute 
to later disease. Martin Widschwendter, an oncologist at University Col- 
lege London (UCL), for example, is planning to analyse tens of thousands 
of possible methylation sites in the cohort’s DNA, looking for changes 
that could explain the link between birth weight and breast-cancer risk. 
The detailed life-course information that can be combined with the DNA 
“is really only available via these cohorts’, says Widschwendter. 
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COURTESY OF D. WARD 


COURTESY OF P. MALVERN 


The doctor’s son does better 
_ than a dustman’s — 


Yet Kuh and others emphasize that fates are not fixed by early life. “I 
don't ever want the findings to be interpreted as purely determinis- 
tic,” says Kuh; she prefers the more optimistic idea that disease risks 
result from an accumulation of experiences throughout life, and that 
education, diet or other factors can shift poor 
trajectories to better ones. Marcus Richards, 
an epidemiologist who is leading the cognition 
studies on the group, points to evidence from 
the 1946 cohort — and supported by many 
other studies — that regular physical exercise 
ina persons thirties and forties can slow their 
cognitive decline with age. “We can take that 
research and say, here is very clear-cut evidence 
of something you can do to protect your cogni- 
tive health as you get older, and this is how you 
should do it,’ says Richards. 

The 1980s brought a vivid lesson in the power 
of environment. Hardly any of the Douglas 
babies, nourished on post-war rations, were 
fat as children — a sharp contrast to those of 
today — and they had maintained a healthy 
weight throughout young adulthood. But now 
incomes were climbing, eating out was more 
affordable, and cars were the way to get around. 
As the cohort approached their thirties, the line 
plotting the proportion who were obese edged 
upwards; in their late thirties it soared*. And 
although those in lower socioeconomic brackets 
did get fatter faster, no social class was immune. 

Somewhere on one of those curves is Mal- 
vern, who found her own weight creeping up 
when she moved to Luxembourg in 1992 and 
stopped work as a school bursar. She weighed 
11 % stone (73 kilograms) when she moved. 
“When I came back in 2000 I was horrified: I 
was 15 stone. It was the paté and the baguettes 
and the cheese and having visitors,’ she thinks — 
on top of the menopause. Malvern has since lost 
weight, and Ward has kept himself trim, he says, 
by living in the Peak District, where “you can't 
get anywhere without going up and down a hill”. 


Cleverness ‘delays | 
the menopause’ | 


As women in the study reached their fifties, a 
more mysterious pattern emerged: those who 
had performed well on childhood intelligence 
tests tended to reach menopause several years 
later than those who had performed poorly’. “We tested almost to 
destruction every social and behavioural pathway; we threw almost 
everything we had at that to see if we could make that association go 
away and it didn’t,” says Richards. But once the researchers considered 
the association, it began to make sense. Their theory now is that child- 
hood cognition provides a readout of brain development, including 
that of some areas that respond to hormones or are responsible for hor- 
mone production. In short, high IQ scores could indicate a brain that 
was well-developed all round, and so was able to sustain reproduction 


Patricia Malvern aged 16; and aged 51, 
holding one of her grandchildren. 
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for longer. Kuh says that she has been testing whether genes are respon- 
sible, “so far without success”. 

In 2005, as the cohort neared 60 and Wadsworth neared the end 
of his scientific career, the project’s future was again in jeopardy. The 
MRC was pondering whether to keep paying for it and, if it did, who 
should lead it. “We didn’t know if the study would be closed down — 
and Mike was retiring. It was a very unstable period,” says Kuh. 

Kuh — who had trained in economics — wanted to build up the bio- 
medical data that Wadsworth had been collecting. Until that time, all the 
examinations had been performed at the study 

members’ homes, but by this stage the nurses 
were staggering under all the equipment. To 
really understand the participants’ physiology 
and biology, Kuh argued, the study needed to 
get them to a clinic. “People appreciate a free 
bone scan,’ she says. By 2008 she had convinced 
the MRC to pay for every willing cohort mem- 
ber to visit one of a number of clinics around 
the country and had established a dedicated 
research unit, now housed in a Georgian ter- 
race in central London. 

Ward went to a clinic in Manchester for his 
exam. He learned that he has signs of osteo- 
porosis in his spine, and that he can no longer 
stand on one leg for long with his eyes closed. 
“You wobble rather more and I ended up hop- 
ping about the place.” He recalls the food diary 
he had to prepare as a “serious challenge”. “You 
don’t want to admit that you had that extra glass 
of plonk or another slice of cake, but you say, 
hang on, this is science, I’ve got to tell the truth” 

Kuh and her colleagues — the study now 
has about 25 full-time researchers and support 
staff and 100 collaborators — are still compil- 
ing such truths about their thousands of par- 
ticipants. “Now the cohort is one of the most 
phenotyped in the world,” says Kuh. Once her 
paper summarizing the latest data goes pub- 
lic’, Kuh is expecting the queue of epidemiolo- 
gists, geneticists and other scientists who want 
to collaborate to lengthen, and last November 
she hired someone for three years especially to 
cope with the increased data sharing. As the 
cohort ages and falls ill, the study will continue 
monitoring participants health and trying to 
tease out the influence of early experience. 
“One big question we can ask is, are these life 
effects we see in mid life going to wane?” says 
Kuh. Or will they, as some epidemiologists 
expect, get more dramatic with age? 

Kuh is also thinking about how best to 

exploit genomic and other biomedical analy- 
ses. At least one study has hinted at the power 
of the cohort’s life-course data combined with 
genetics. Last year, Rebecca Hardy, a statisti- 
cian with the survey, published a study of two 
hot genes called FTO and MCAR, variants 
of which have been identified as risk factors for obesity’. When she 
analysed DNA collected from the cohort in 1999, she found that the 
association of those variants with body mass index increased in early 
adult life, then weakened as the cohort grew older. Perhaps, Hardy 
speculates, any effects of the genes on appetite or fat storage were over- 
whelmed by that onslaught of fat-promoting influences in the 1980s, 
a possibility that might become clearer when she tests a further panel 
of obesity-linked genes. 

Ever protective of her study members and the limited DNA samples 
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Happy Birthday 1983 


The research team never forgets to send birthday cards to the cohort. 


CARDS & CALLS 


How to keep a cohort together 
— for 65 years 


After tracking its subjects’ 
health and well-being for 
longer than any other study, 
the 1946 British birth-cohort 
study has lessons to offer its 
younger siblings. British birth 
cohorts were started in 1958, 
1970 and 2000, and another 
is provisionally planned. In 
the United States, children are 
being enrolled in the National 
Children’s Study, which aims 
to follow some 100,000 
children from before birth 
to age 21. Yet other cohort 
studies have been felled 
by bureaucratic infighting, 
spiralling costs or a lack of 
sustained funding. Diana 
Kuh, director of the 1946 
survey, attributes its survival 
to having fairly autonomous, 
dedicated leaders anda 
relatively low budget. “We’ve 
always had to offer good value 
for money.” 

The 1946 study also 
shows that building a 
strong relationship with the 
participants is vital. Every 
year, the members receive 
a birthday card, signed 
by the research team and 
telling them about the latest 
results. One participant, 
Patricia Malvern, says that 
the card means a lot to her. 
“Somehow, over the years | 
began to feel | knew the team 
members, although | had 
never met any.” One year the 


card showed a sunset, and 
some recipients complained 
about the suggestion that 
they were entering the 
evening of life. Kuh and her 
colleagues responded to 
those complaints, like every 
other enquiry from the 
participants, with personal 
letters or calls. Kuh says that 
this relationship has been 
crucial to keeping an average 
of 80% of the original cohort 
in the study. When leaders 
of other studies hear that 
figure, she says, “people are 
amazed”. 

But some factors in its 
success can’t be duplicated 
today. In 1946, recruitment 
and consent issues were a 
lot simpler: “If someone was 
willing to see you, that was 
consent,” says Kuh, “and the 
response rates were over 90% 
probably because people didn’t 
think they could choose not 
to participate.” Those simpler 
days also brought constraints. 
In the early years of the study, 
questions about money 
and sex were “off the table”, 
illegitimate children were turfed 
out of the study, and mothers 
were not asked whether they 
smoked in pregnancy because 
“the minister for health was 
telling the soldiers to smoke,” 
says Kuh. “But that’s part of 
being the history of science, 
really.” HP. 
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she has, Kuh says that she views the latest molecular biology techniques 
with caution. “I feel a huge responsibility to deliver,’ she says. Quite 
often, she says, outside researchers have an attitude of “give us all the 
cohort data and we'll rush this through and find millions of associa- 
tions. I say, well, that sounds very interesting; can you come back with 
a hypothesis?” Even so, when Kuh compiles a plan for the MRC’s five- 
yearly review of the survey in 2012, she knows that working out how 
to incorporate these technologies “is going to be key”. The falling cost 
of DNA sequencing means that ploughing through participants’ entire 
genomes is an almost inevitable step, she acknowledges. “The questions 
are, when is the best time — and what would we learn from it?” 


A survey taking on 


a life of its own 


For now, Kuh has more immediate planning concerns: five 65th-birth- 
day parties, at which the study members will meet each other for the 
first time. The parties are causing her some anxiety. Wadsworth had 
considered and rejected the idea of a 50th- or 60th-birthday bash, in 
case the get-together ended up influencing the participants life course 
in some way. “Basically, we thought people might leave their partners 
and get off with someone in the study,’ he says. But Kuh decided that 
recognizing and rewarding the members was worth the risk. (She even 
wrote to Buckingham Palace to request a garden-party invitation for 
the study members. “I wrote such a nice letter. I learned all about how 
to address the Queen, and I'm still hoping to get a reply.”) 

Ward and Malvern are pleased to have been part of the study. “It 
gives me a fair old bit of pride in a way,” says Ward. “Just things like 
bed-wetting. What did I contribute to the nation’s store of knowledge 
on bed-wetting?” Neither is perturbed by the idea of the researchers 
watching them until they crumble and die. “I suppose,’ says Ward, “it 
helps you accept that you're mortal, you're not going to last forever” 

Some 13% of subjects have died so far — and the study already has 
something to say about the fate of the rest. Kuh flips open some graphs 
of survival rates that she has calculated. They show the proportion 
of the survey members surviving up to age 60, separated by father’s 
social class. And they reveal yet another curious correlation for Kuh 
and her colleagues to dig into. Kuh points out a blue line representing 
a group of women from better-off backgrounds, whose death rate is 
about half that of everyone else’. Kuh has not been able to attribute the 
effect to less smoking or other obvious factors, and she suspects that 
these women took advantage of the educational and health opportuni- 
ties afforded by post-war Britain to improve themselves. “They really 
changed their lives with education. The girls, if they got through, they 
did really well” 

Yet the study is lending a touch of immortality to all its participants, 
whether men and women, born into comfort or poverty. Traces of them 
will live on in preserved DNA, cell lines frozen in liquid nitrogen — and 
in their records, now all transferred from punch cards to computers. 
“You're very aware that your memory is going,’ says Ward. “But you 
also know that in the archive is a version of you.” 

“T often call it an alternative biography in there,” he adds, “and that 
Id quite like to get my hands on.” m SEEEDITORIALP.5 


Helen Pearson is Nature’ chief features editor. 
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Researchon. 
the reservation 


American Indians have had some unhappy interactions 
with scientists in the past. Now, America’s tribal 
colleges are rapidly expanding their own research. 


By Zoé Corbyn 


atie McDonald had never 

given much thought to the 

trout in Flathead Lake — 

except when fishing with her 
family. She didn’t wonder about heavy-metal 
pollution or how that might affect people eat- 
ing the fish. But that was before the then-19- 
year-old student started a bachelor’s degree in 
environmental science at Salish Kootenai Col- 
lege in northwest Montana and had to choose a 
research project. She saw that trout consump- 
tion was going up on the Flathead Indian 
Reservation, where she lived. Poor people, in 
particular, had begun to receive donated fish. 
So McDonald set out to see whether there was 
cause for concern. 

Her institution is a tribal college, one of 36 
scattered around the United States (see ‘US 
tribal colleges’) and serving some of the least- 
developed communities in the country. But 
thanks to several federal programmes seek- 
ing to boost science within tribal colleges, 
McDonald had access to equipment such as 


a state-of-the-art mercury analyser. She ran 
samples of the lake trout (Salvelinus namay- 
cush) and found surprisingly high levels of the 
toxic metal’. 

The results were compelling enough for the 
tribal government to advise women of child- 
bearing age to avoid eating older, larger fish 
from the lake altogether — a more stringent 
recommendation than state guidelines that 
suggest eating no more than one a month, says 
Barry Hansen, the tribes’ fisheries biologist. 

Douglas Stevens, head of life sciences at 
Salish Kootenai, says that McDonald’s work 
shows students how their scientific research 
can serve the local community. 

That message is big change for American 
Indians, who have typically been research 
subjects rather than investigators in studies 
ranging from anthropology to genetics. And 
like indigenous peoples around the world, 
American Indians have sometimes been 
treated poorly by the scientific establishment. 
In a high-profile case last year, Arizona's 
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Native American researchers found raised levels 
of mercury in fish from Flathead Lake, Montana, 
which borders the Flathead Indian Researvation. 


Havasupai Indian tribe settled a lawsuit it had 
filed against Arizona State University in Tempe 
for conducting genetic analyses that the tribe 
says were done without express permission. 

That case and others have fostered a cli- 
mate of suspicion among some American 
Indians towards mainstream researchers. But 
tribal colleges are now trying to harness sci- 
ence for their communities’ own purposes by 
building up their capacity for both training 
and research. With an influx of funding from 
several federal agencies over the past decade, 
these institutions have started to hire more 
faculty members with research credentials, 
develop better facilities and establish science 
degree programmes. 

Although there are difficulties, particu- 
larly in research quality and publication rates, 
supporters say that the increasing focus on 
scientific research at tribal colleges is helping 
both students and their communities. 

It can be seen “as an act of resistance” says 
Luana Ross, the president of Salish Kootenai. 
“We are taking control of the research process.” 


DEMAND FOR DOCTORATES 
The emphasis on research is part of a broader 
set of changes at tribal colleges, most of which 
operate in self-governed nations. Unlike main- 
stream US universities, where undergraduates 
typically pursue four-year bachelor’s degrees, 
tribal colleges have traditionally offered only 
two-year degrees and vocational training. 
Because many of them serve relatively poor 
communities with struggling primary and 
secondary schools, tribal colleges must pro- 
vide remedial education to make up for gaps 
in students’ basic skills and knowledge. 

But several tribal colleges are also seeking 
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to raise the level of their instruction by hir- 
ing teachers with more training. The percent- 
age of staff with doctorates at tribal colleges 
rose by nearly 40% from 2003 to 2009, going 
from about 8% to 11% of the total, according 
to figures from the American Indian Higher 
Education Consortium, based in Alexandria, 
Virginia. 

The focus on science seems to be having an 
effect. Although enrolment at tribal colleges 
has been decreasing, the number of students 
pursuing degrees in science rose by more than 
70% between 2003 and 2009, to about 1,200 
students altogether. And eight tribal colleges 
now offer full four-year bachelor’s degrees, 
with about 70 applied-science bachelor’s pro- 
grammes available. 

The initiatives at tribal colleges are aided 
by a collection of programmes totalling about 
US$20 million annually, from federal agencies 
suchas the National Science Foundation (NSF). 

“One of the reasons for the phenomenal 
growth in science enrolmentat the tribal col- 
leges is because they are able to provide under- 
graduate research opportunities,’ says Jody 
Chase, who manages the 
NSF’s Tribal Colleges and 
Universities Program. Since 
2001, that programme 
has provided $13.5 mil- 
lion a year in funding to 
strengthen science courses 
at tribal colleges and other 
institutions serving Native 
Americans in Alaska and 
Hawaii. 

Many of the research 
projects at tribal colleges 
focus on the local com- 
munity. Researchers at 
Diné College in the Navajo 
Nation of Arizona, for 
example, worked with sci- 
entists from the US Geo- 
logical Survey in Reston, 
Virginia, to investigate 
why residents in the Shiprock area of the res- 
ervation have roughly five times the rate of 
respiratory illness seen in nearby communities, 
despite a relatively low incidence of smoking. 
The area is home to some of the largest coal- 
mining and electricity-generating operations 
in the United States. 

By examining hospital records and moni- 
toring indoor air quality in more than 130 
homes, the researchers linked respiratory 
problems to high concentrations of par- 
ticulate matter from the burning of coal in 
stoves not designed for that purpose’. The 
coal is provided at low or no cost to Navajo 
living near coal mines, as part of reserva- 
tion lease agreements. The study has led to 
a large community-education campaign 
emphasizing, for example, the importance 
of leaving a window open. The college has 
also recommended that the tribe support a 
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America's 36 tribal colleges climbed by 70%. 


stove-replacement programme. 

Scientists working outside the tribal col- 
leges say that the focus on research is raising 
standards at these institutions. “They have 
had some really good successes,’ says David 
Burgess, a Native American cell biologist 
at Boston College in Massachusetts, who is 
involved with the Society for the Advancement 
of Chicanos and Native Americans in Science. 
Burgess says that the presentations given by 
many tribal-college students at the society's 
annual conference are getting stronger. 


GLOBAL PHENOMENON 


The research expansion has parallels in other 
countries such as Norway, New Zealand and 
Canada, where universities serving indig- 
enous peoples are conducting studies on 
topics of local interest that would not other- 
wise be explored. “It is a global phenomenon,’ 
says Boni Robertson, a professor of indigenous 
policy at Australia’s Griffith University in 
Queensland, and co-chair of the World Indig- 
enous Nations Higher Education Consortium. 

But along with their successes, America’s 


members don’t have the necessary experi- 
ence to undertake research — a problem that 
the American Indian College Fund (AICF), 
based in Denver, Colorado, is trying to rectify. 
It awards fellowships to faculty members at 
tribal colleges to start and finish PhDs and do 
their own research. Yet although the organi- 
zation recruits intensively, each programme 
receives only a handful of applications. 

Doing a PhD on a local Indian issue can bea 
tough slog, says Valerie (Pretty Paint) Small, a 
faculty member at Little Big Horn College on 
the Crow reservation in southern Montana. 
She has an AICF science fellowship to finish 
her PhD at Colorado State University in Fort 
Collins, where she is studying the invasion 
of a non-native tree species on Crow tribal 
lands. “So many people know so little about 
our contemporary issues,” she says. “Univer- 
sity professors don’t make much of an effort 
to see where you might be coming from if you 
are on the reservation.” 

Faculty members at tribal colleges also 
struggle to publish their work, in part because 
of large teaching loads. To improve publica- 
tion rates, colleges such 
as Salish Kootenai are 
forming writing groups 
for faculty members, and 
administrators are explor- 
ing ways to give staff time 
off for research. And the 
AICF hopes to start a peer- 
reviewed, interdisciplinary 
journal this year that would 
publish research under- 
taken at tribal colleges. 

Some scientists wonder 
whether tribal colleges 
would be better off expand- 
ing their partnerships with 
research-intensive univer- 
sities rather than trying 
to do research on their 


tribal colleges have run into some hurdles in 
their scientific efforts. Enrolment in science 
programmes has climbed, but the number of 
students completing two-year or four-year 
science degrees remained essentially flat from 
2004 to 2009. 

There are also concerns about the research 
at these institutions. Barbara Howard, a sen- 
ior scientist at the non-profit MedStar Health 
Research Institute in Hyattsville, Maryland, 
has worked in Indian country since the late 
1980s directing the Strong Heart Study, the 
largest epidemiological study of Native Amer- 
icans. Howard welcomes the rise in under- 
graduate research at the tribal colleges, and 
says it is the best way to encourage students 
to go on to graduate school. But, she says, the 
tribal colleges need to improve in terms of 
their “research quantity and complexity”. 

A major obstacle is that many faculty 
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own. “Why recreate those 
resources when they can 
partner with other institutions and develop 
new kinds of synergy?” asks Spero Manson, 
a Native American medical anthropologist 
who directs the Centers for American Indian 
and Alaska Native Health at the University of 
Colorado in Denver. 

But Daniel Wildcat, acting dean of natural 
and social sciences at Haskell Indian Nations 
University in Kansas, says that doing research 
within tribal colleges allows Native Ameri- 
cans to “design their own research agendas’, in 
which tribal values rather than those of outsid- 
ers determine what gets studied. m SEE EDITORIAL P.5 


Zoé Corbyn is a freelance journalist based in 
San Francisco. 
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Anineteenth-century engraving of machine-breakers attacking machinery in a textile factory. 


In praise of Luddism 


Two centuries on from the Luddite insurrection, David Edgerton celebrates today’s 
most important opponents to new ideas, inventions and innovations: scientists. 


n March 1811, machine-breakers struck in the centre of England. 

They were not the first or the last, but they started what became 

known as the Luddite outrages or insurrection. The targets were 
employers and their machines — stocking-makers and their knit- 
ting frames at first, later other textile manufacturers and machines. 
The breakers were hand-knitters whose livelihood was threatened. 
The name came from General or King Ludd, the leader the Luddites 
invented asa signatory to proclamations. 

Since then, especially in the late twentieth century, a Luddite has 
been someone opposed to progress, especially to science and technol- 
ogy. Nowadays, it is a generalized term of unthinking abuse designed 
to crush any criticism. 

In fact, opposition to most new ideas, inventions and innovations is 
essential for progress. Most grant applications and scientific papers are 


rejected; most inventions have to be rejected if there is to be enough 
time and money to develop any at all. Scientists have had a crucial role 
in this opposition — they led the charge against new gadget mania 
during the Second World War, and afterwards. 

If by ‘Luddism’ we mean, as was the case in 1811, opposition to 
specific novelties for particular reasons, as opposed to novelty in gen- 
eral, then Luddism is indispensable and scientists should cultivate 
their important, and venerable, role as its most rigorous practitioners. 

It is not sufficiently recognized that creation, scientific or otherwise, 
is a tragic business. Most inventions meet nothing but indifference, even 
from experts. Patents are little more than a melancholy archive of failure. 
Most ideas of every sort are rejected, as would be clear if there was a 
repository for abandoned drafts, rejected manuscripts, unperformed 
plays and unfilmed treatments. The reason is not hostility to novelty. > 
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> On the contrary, most new ideas and products must be rejected 
because there are so many of them. In the rich world, some institutions 
and individuals have been so fecund with inventions that not even all 
the good ones could be used: there have been many processes to make, 
say, synthetic ammonia, or take colour photographs, but only a few are 
used. The point is not whether we reject, but how we do it, and why. 

Scientific Luddism, however, doesn’t even acknowledge its own 
existence. How could it, in a world in which science is held to be about 
creativity, innovation, the future, ideas, inventions and spin-outs? 
Party poopers are not welcome. 


WAR ON WASTE 

Science has a long and distinguished history of Luddism. In the early 
eighteenth century, some natural philosophers — such as the Royal 
Society's Jean Desaguliers — worked to discredit many projects and 
doubtless saved fortunes from being invested in perpetual motion 
machines. For example, with the rise of science-based industry in the 
late nineteenth century, chemicals companies employed scientists not 
just to control processes, or to create, but also to assess, and thus usu- 
ally reject, inventions. Within government, scientists were used to sift 
through the thousands of ideas for potentially war-winning gadgets 
that were received from patriotic inventors in both great wars of the 
twentieth century. 

In the Second World War, British scientists were actively involved in 
opposing new ideas for weapons. Surely there were enough Luddites in 
government and the armed forces in Britain that no help was needed 
from scientists? That certainly was the public view of many scientists 
who railed against administrators and 


politicians educated in the classics and “In the Second 
history. The reality was very different: the world War, British 
British political and military elite (sup- Scientists were 
ported by many scientists of course) was activ ely involved in 
addicted to new machines, to machines F id 
that would transform war and allowa 2PPOStIs New ideas 


” 
great scientific nation to triumph over for weapons. 


hordes of continental conscripts. Neville 

Chamberlain, prime minister between May 1937 and May 1940, who 
had a university education in science, was one such neophile, but in 
this as in so much else, he was overshadowed by his successor Winston 
Churchill. 

Churchill was a noted enthusiast for machines and an inventor 
himself. His close personal adviser on matters scientific, techni- 
cal and economic, was an Oxford professor, the physicist Frederick 
Lindemann, easily the most influential academic or scientist to have 
served in government. Their response to the crisis of 1940 — the fall of 
Norway, the evacuation of Denmark and the fall of France — involved a 
call for more radical weapons. Between them they encouraged all sorts 
of new gadgets: aerial mines to bring down bombers, jet engines, the 
atomic bomb, anti-aircraft rockets, anti-tank devices of many kinds. 
Their enthusiasm was boundless, their progress-chasing relentless. 

Among the Luddites were the physiologist Archibald V. Hill, the 
chemist Henry Tizard and the physicist Patrick Blackett, all expe- 
rienced scientific advisers. Hill was elected by the graduates of the 
University of Cambridge to one of their two Parliamentary seats 
(a system abolished in the late 1940s) and was the only scientific Nobel 
laureate ever to sit in the Commons. He wasa conservative, but one of 
Churchill’s strongest opponents. Blackett was a socialist, who won his 
Nobel prize after the war. Tizard was the dean of scientific advisers, 
and associated with the most technically progressive part of govern- 
ment since the First World War — the air force. 

All three men turned against the inventors and the prime minister 
who so actively supported them. As Hill complained to Parliament in 
February 1942: “There have been far too many ill-considered inven- 
tions, devices, and ideas put across, by persons with influence in high 
places, against the best technical advice... They have cost the country 
vast sums of money and a corresponding effort in development and 
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production, to the detriment of profitable expenditure of labour and 
materials elsewhere.” 

We know from Hill’s papers that he thought the greatest waste of 
money was the anti-aircraft rocket programme dating from the 1930s. 
He estimated that this giant effort cost the equivalent of between 3 and 
16 battleships, or the same number of very large factories, and consumed 
three or four times more cordite than used to fire the same number of 
conventional anti-aircraft shells. He called it a “most infernal waste of 
time, effort, manpower and material”. By June 1941, the government was 
demanding production of 9 million rockets a year, despite the fact that 
they barely worked. They are now all but forgotten. In fact, production 
never exceeded 2.5 million, and was saved by an unexpected new use 
for the rockets as ship- and tank-busters. 

Blackett, who headed operational research for the navy, engaged ina 
general critique of the pursuit of novelty. Writing in December 1941, in 
a paper setting out the principles of operational research, he criticized 
the call for ‘New weapons for old; as a form of “escapism” Too little effort 
was going into “the proper use of what we have got’, he wrote. Chang- 
ing tactics could be more effective than changing weapons’. He and 
Tizard wanted to redeploy scientists from research and development to 
“improve the operational efficiency of equipment and methods now in 
use”. Both men also opposed Britain building an atomic bomb, on the 
grounds that it was likely to take longer and cost more than promised. 
In this they were proved correct — there was no bomb until the US one 
of 1945, and far from being cheaper than conventional explosive, it was 
the most expensive ever made. The US bomb tookat least 2 years longer, 
and cost 50 times more, than the British bomb was meant to. 

Being a scientific Luddite was not easy. Charles Goodeve, who had 
been the Admiralty’s senior scientist during the Second World War, 
recalled that “the voices of reason” who opposed, on the grounds of 
cost, the extraordinary wartime scheme to build a gigantic aircraft 
carrier out of ice (codename Habakkuk) were “shouted down by cries 
of obstruction” in the internal deliberations of government. Goodeve 
estimated (although this is certainly an exaggeration) that Habakkuk 
was the most serious misallocation of Allied effort of any wartime 
invention’. It was supported by scientists of distinction, most nota- 
bly the socialist crystallographer J. D. Bernal, but fortunately was not 
pursued beyond the experimental stage. 

The Second World War has been treated as a moment of triumph 
for British science, and this is associated with a small range of well- 
known devices — radar, jet engines, penicillin, the Mulberry artificial 
harbour, the Pipe Line Under The Ocean (PLUTO) and sometimes the 
Habakkuk. Of these, only radar made a definite positive contribution 
to the war. Most of the rest were either irrelevant to it, or of marginal 
importance. British jet engines made no impact, nor did the atomic 
bomb, which marked rather than caused the end of the war. The two 
Mulberry harbours towed to the Normandy beaches, although much 
celebrated, contributed less than propaganda implies then and since. 
The Americans managed perfectly well after the Mulberry built for them 
was destroyed in a storm before it was even finished. The PLUTO, which 
took petrol across the English Channel, although built at great expense, 
was, as US Luddites had suggested, quite unnecessary and, furthermore, 
worked very badly. 


Churchill’s Luddites: Archibald V. Hill, Patrick Blackett and Henry Tizard. 
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A ‘Z’ battery of anti-aircraft rockets in action in 1944, Physiologist Archibald V. Hill declared them an “infernal waste of time, effort, manpower and material”. 


The reality is that the British war effort would have been more 
effective had none of these projects gone ahead and the vast cost of 
development and deployment been spent elsewhere. The lesson is this: 
not every famous technology is important. 


LUDDITES NEEDED 

After the Second World War, Britain could have done with more 
scientific Luddites. Successful opposition to British nuclear power 
stations in the 1950s and 1960s would have saved the nation billions of 
pounds, because they generated electricity more expensively than need 
have been the case. A stronger opposition to Concorde (the dissenters 
included the future Nobel laureate, physicist Nevill Mott) would have 
deprived some rich people of time spent in the air, and the British and 
French taxpayers could have spent more on developing other modes 
of transport, perhaps speeding up the development of trains. 

Today publicly funded researchers are under pressure to identify 
the probable ‘impact’ of their work. Instead of making fun of scientists 
for pooh-poohing the economic prospects of their own discoveries 
— Ernest Rutherford describing atomic energy as “moonshine” is the 
much-repeated example — we should celebrate the unseen work that 
rightly stops the great majority of ideas and inventions from getting 
anywhere. That great British innovation, the National Institute for 
Health and Clinical Excellence (NICE), which brings evidence to bear 
on medical procedures and drugs, is under huge pressure from inter- 
ested pharmaceutical firms and sponsored patients’ groups to endorse 
products of doubtful utility. We need much more serious opposition 
to the gigantic waste of innovative resources that goes into ‘me too’ 
drugs. We also need to reject the fake technical fixes that are every- 
where on offer — often the problem is not lack of technical means, but 
something else entirely. Feeding the world might benefit from genetic 
modification technology, but it will not be achieved if the only change 
is more genetically modified crops. 


Above all, we should reject the idea that even the original Luddites 
were opposed to progress, or the machine. In response to a govern- 
ment bill to make machine-breaking a capital offence, the poet Lord 
Byron explained to the House of Lords that the Luddites imagined 
that “the maintenance and well-doing of the industrious poor were 
objects of greater consequence that the enrichment of a few by any 
improvement in the implements of trade”. Yet, said this great defender 
of the Luddites, “the adoption of the enlarged machinery ... might have 
been beneficial to the master without being detrimental to the servant’, 
but the state of the economy at the time meant that this was not the 
case’. Similarly most people said to be Luddites today are not against 
progress or science and technology in general, but against particular 
manifestations in particular contexts, just as scientists are. 

Scientists should embrace and refine their Luddite sides — as a 
public service, and as a service to knowledge and invention. Using 
their authority, they should insist on the difference between science 
as whole and a particular part, and of the necessity of nay-saying. That 
would in itself help to raise the level of elite (let alone public) discussion 
above its current, depressingly low level. m 


David Edgerton is at the Centre for the History of Science, 
Technology and Medicine at Imperial College London, London SW7 
2AZ, UK. His book Britain’s War Machine: Weapons, Resources and 
Experts in the Second World War will be published in the United 
Kingdom on 31 March by Allen Lane and in the United States in 
August by Oxford University Press. 

e-mail: d.edgerton@imperial.ac.uk 
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Training for Africa’s women farmers is vital if the continent is to shift to more sustainable agriculture. 


AGRICULTURE 


A bowl half full 


Calestous Juma’s vision for African farming is 
refreshingly optimistic, finds Camilla Toulmin. 


investment lie at the heart of Calestous 

Juma’s upbeat assessment of the future 
of African agriculture. In The New Harvest, 
Juma, an expert in international develop- 
ment, shows how agricultural science, the 
business of development and the institutions 
that shape food markets are transforming 
the opportunities of farmers and traders 
across the continent. 

His optimism is refreshing — a welcome 
antidote to the pessimistic view of African 
development of previous decades. Forecasts 
predict annual economic growth of 5-8% 
for many African countries over the next 
few years. But important questions remain 
over how the power and benefits that will 
come from this growth will be distributed. 

Juma’s book is timely. The renewed vola- 
tility in food and fuel prices is prompting 
worries about food security and the associ- 
ated risks of political turmoil, as seen in North 
African countries. Agriculture is firmly back 
on the international agenda. Rising food 
prices are bad for poor people, but bring 
extra revenue for those producing a surplus. 
Agricultural land is now an asset, prompt- 
ing a ‘land grab’ by companies and foreign 
governments who have taken advan- 
tage of cheap land and poorly governed 


[imssnent entrepreneurship and 


= natural-resource rights 
in parts of Africa. 
Juma sees regional 
economic organi- 
zations, such as the 
ft Common Market for 
| Eastern and Southern 
Africa (COMESA), as 
} essential in promot- 
ing innovation. By 
aggregating markets 


te Nty 
ole "eves 
7 
The New Harvest: 


sackseenaie 7 and pooling research 
Africa resources, benefits 
CALESTOUS JUMA can be spread among 
Oxford University countries, giving 
Press: 2011.296 PP. small African nations 
$19.95 


the economies of scale 
needed to compete and diversify. Although 
many regional bodies have been criticized 
for the gap between their aspirations and 
achievements, and for their overlapping and 
competing mandates, Juma is positive about 
their future role, citing cross-border planning 
and investment in electricity and gas distri- 
bution as an example of good practice. 

The book is filled with examples of 
improvements in food and farming systems, 
where governments have energized local 
groups. For instance, the University of Agri- 
culture in Abeokuta, Nigeria, and the food 
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company Nestlé are working together on 
soya beans and cassava to improve varieties 
and productivity, thus increasing incomes 
for Nigerian farmers. Similar projects have 
enhanced the take-up of improved rice 
varieties with higher yield and greater stress 
resistance in Benin, Ghana and Ethiopia. 

Philosophically, Juma takes an eclectic 
approach: he draws on three strands of argu- 
ment. First, he embraces the opportunities 
of an outward-looking, market-oriented 
farming system in which productivity 
grows through new science-based breeds 
and seeds. Second, he accepts the need for 
food security and the strengthening of local 
knowledge networks, indigenous systems 
and varieties. Third, he acknowledges that a 
combination of population growth and lack 
of innovation has led to stagnant conditions 
and African reliance on food imports. 

Juma recognizes that past governments 
have drained the agriculture industry of rev- 
enues in favour of building up other sectors. 
Long-term neglect has meant that irrigation 
covers only 4% of Africa's cropped area, and 
fertilizer use in the continent is one-tenth of 
the world average. But, he argues, the latest 
generation of African leaders offers a greater 
commitment to invest in agriculture. 

Juma is keen on applying technology to 
agriculture. He looks to advances in tissue 
culture and breeding, such as using a genetic 
or morphological marker to select for desir- 
able traits, and to cheaper genetic mapping 
of Africa’s crops for improved productivity. 
He brushes aside sceptics reservations about 
genetically modified crops and sees instead 
the promise of lower pesticide use and higher 
yields that companies such as Monsanto have 
brought to smallholder farmers growing cot- 
ton in Burkina Faso and South Africa. But 
he does not address the main concern about 
genetically modified crops, namely the con- 
centration of economic power held by a few 
major agrochemical companies. 

He lauds the now-ubiquitous mobile tele- 
phone for addressing old problems in new 
ways, and for transforming people's access to 
market information and financial services. 
But Juma reminds us that fundamental invest- 
ments are still needed in basic infrastructure, 
such as road networks, to get supplies to farm- 
ers and harvests to market. Reducing trans- 
port costs would greatly increase farmers’ 
ability to respond to market demand. 

To foster innovation, Juma champions 
a “cluster approach’, in which groups from 
government, the private sector, civil society 
and researchers come together in a variety 
of partnerships to work on common prob- 
lems, building trust 
through proximity. In 
the past, he explains, 
the state was too dom- 
inant: public-sector- 
led approaches to 
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agricultural research assumed a pipeline 
of technology from lab to farm. In future, 
the farmer should be seen as a co-producer 
of research, not a recipient of handed- 
down technology. China is fostering this 
approach. Juma gives a good illustration 
of the boom in vegetable production in 
China’s Shandong province, where local 
government has encouraged a range of 
businesses and market development. 

With its climate changing and popula- 
tion growing, Africa will have to produce 
food under greater environmental pres- 
sure. He argues that agriculture needs to 
shift towards more sustainable farming 
patterns, which are also more knowledge- 
intensive. This will require governments 
and societies to address the deficit in train- 
ing and access to education, particularly for 
women, who are the backbone of food pro- 
duction in many African farming commu- 
nities. For example, the non-governmental 
Uganda Rural Development and Training 
programme teaches farming methods 
using a curriculum that focuses on build- 
ing strong female agricultural leaders. 

Weak infrastructure and scarce land 
and water may be harder to overcome 
than Juma suggests. A few African govern- 
ments recognize the potential of agricul- 
ture to drive economic growth and reduce 
poverty, but there is still a long way to go. 
Climate-change impacts could be hugely 
damaging unless urgent work is done to 
construct resilient local and national sys- 
tems. A focus on the benefits of global 
markets, inward investment and modern 
technology needs to be balanced by con- 
sideration for who gains and who loses. 

Concerns are already being raised by 
farmers’ associations about large-scale 
investment in agricultural land, in ‘land- 
grab’ deals negotiated by few people. In 
Mali, farmers want a moratorium on 
large land allocations, and have issued a 
call to remind the government that land, 
water, forests and natural resources consti- 
tute national assets for all citizens. Other 
investment models, such as contract farm- 
ing or joint ventures with local farmers, 
should be considered. Transparency is also 
needed to ensure that investors undertake 
their contractual obligations and do not 
engage in speculation. 

Like Juma, I see the glass as half full, 
but there are many challenges ahead. 
Nevertheless, The New Harvest reminds 
us that by working with farmers, non- 
governmental organizations, government 
and business, science has the potential to 
transform Africa’s food security. m 


Camilla Toulmin is director of the 
International Institute for Environment 
and Development, London, UK. 
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Greenland gives an alarming assessment of climate change in a production as complex as the topic itself. 


Poles apart on climate 


Two contrasting plays highlight the difficulties of 
putting global warming on stage, finds Kerri Smith. 


of climate change — and two plays 

currently running in London reveal 
many facets of those arguments. Greenland 
is a rational but disjointed assessment of 
how urgent and alarming our predicament 
is, whereas The Heretic is an entertaining 
family drama with a climate sceptic as the 
protagonist. 

Greenland, at the National Theatre, 
is a production almost as complex and 
unwieldy as climate change itself. It weaves 
together several narratives: a student-teacher 
becomes a green activist; a birdwatcher wit- 
nesses habitat change in the Arctic; a couple 
argue over their individual contributions to 
global warming. These unfolding tales share 
the stage with falling rain and a remarkably 
life-like model of a polar bear. 

The most engaging scenes involve the 
play’s climate modeller, Ray, and a gov- 
ernment official, Phoebe, sent to gather 
data ahead of the December 2009 climate 
negotiations in Copenhagen. She arrives at 
the lab after he has worked all night on his 
model; he is reticent to let her see his work 
before it has been peer reviewed. When 
they get to Copenhagen, we are given a 
sense of the convoluted processes involved 
in drafting an international policy agree- 
ment when a dozen weighty volumes fall 


Newene debate has polarized the issue 
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Greenland 

MOIRA BUFFINI, MATT CHARMAN, PENELOPE 
SKINNER AND JACK THORNE 

National Theatre, London. Until 2 April 2011. 


The Heretic 
RICHARD BEAN 
Royal Court Theatre, London. Until 19 March 2011. 


from the ceiling and land with a thud. 

But the multitude of characters and 
jumbled storylines make this play difficult 
to follow. Laced with statistics and quotes, it 
feels at times like a lecture. Greenland’s four 
writers — Moira Buffini, Matt Charman, 
Penelope Skinner and Jack Thorne — spent 
months researching the topic by interview- 
ing experts, activists and journalists. The 
team hoped to convey the complexity of the 
issue, says the play’s artistic director, Ben 
Power. “We're trying to explore the feeling 
of powerlessness,’ he adds. What they actu- 
ally depict, in shoehorning all their research 
onto the stage, is confusion. 

Richard Bean’s The Heretic is easier to 
watch, with its linear storyline, entertaining 
characters and laugh- 
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fictitious professor Diane Cassell, who 
studies sea-level change in the Maldives. Her 
data suggest that there is no rise — putting 
her at odds with her department and making 
her a target for death threats from an envi- 
ronmental activist group. She infuriates her 
colleagues even further when she defends her 
views on a television show hosted by BBC 
Newsnight presenter Jeremy Paxman — play- 
ing himself in a pre-recorded video cameo — 
leading to a dramatic turn of events. 

Cassell also tutors a student with strong 
environmentalist leanings and helps her 
own daughter, a Greenpeace member, to 
battle anorexia. One section draws on the 
e-mail hacking controversy of November 
2009 at the University of East Anglia, UK. 
Cassell’s student hacks into another uni- 
versity’s mainframe and discovers e-mails 
in which the author was keen to ‘bury 
the downturn’ — a reference to “hide the 
decline’, a phrase in the real hacked e-mails 
that was seized upon by climate sceptics. 

The problem with The Heretic is that 
although the ‘science’ presented is sloppy in 
places, its mouthpiece, Cassell, is likeable, 
witty and compelling — perhaps enough 
to convince the audience that the science is 
sound. Cassell argues, for instance, that the 
research on sea levels that went into reports 
from the Intergovernmental Panel on Cli- 
mate Change “used a single tide gauge’, 
rather than the many records that climate 
scientists actually collected. Interviewed 
after the play, environmental economist 
Dimitri Zenghelis of the London School of 
Economics, who consulted on Greenland, 
voiced concerns about the misinformation 
that Cassell’s character helps to propagate. 

Both plays do a good job of portraying 
their scientific protagonists as people. In 
Greenland, climate scientist Ray worries 
whether it is irresponsible to start a family 
given future climate risks. Cassell in The 
Heretic grapples with family and romantic 
dramas as well as her scientific dilemma. 
Zenghelis says one helpful aspect of The 
Heretic is that Cassell’s character identifies 
“the problem of objective scientists without an 
agenda struggling to be heard”. But in the real 
world, it is not the sceptics who have trouble 
getting their message out: “[ The Heretic] got 
things the wrong way around,’ he says. 

On the evidence of these two plays, 
climate science and theatre do not seem to 
be natural bedfellows. But like the Iraq War 
or the Enron financial scandal (both subjects 
of recent plays), complex topics that affect 
everyone should be dramatized. They just 
need to be accurate as well as entertaining. 
“People said to us, “For God’s sake make it 
an interesting play! Don't lecture us,’ Power 
admits of Greenland. In the end, The Heretic 
meets this target. Greenland falls short. = 
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Books in brief 


The New Cool: A Visionary Teacher, His FIRST Robotics Team, and 
the Ultimate Battle of Smarts 

Neal Bascomb CROWN 352 pp. $25 (2011) 

Robot-building competitions are ‘the new cool’ in high schools 
across the United States. Writer Neal Bascomb follows a team 

of California teenagers and their inspirational physics teacher as 
they try to win the coveted FIRST (For Inspiration and Recognition 
of Science and Technology) contest, a nationwide annual project 
instigated 22 years ago by inventor Dean Kamen. In relating the 
team’s travails, Bascomb shows how children are enthused by 
hands-on approaches to science and technology. 


Moby-Duck: The True Story of 28,800 Bath Toys Lost at Sea and 
of the Beachcombers, Oceanographers, Environmentalists, and 
Fools, Including the Author, Who Went in Search of Them 
Donovan Hohn VIKING 416 pp. $27.95 (2011) 

After hearing about thousands of plastic toys washed up on Alaskan 
shores after the loss of a container from a Chinese ship, journalist 
Donovan Hohn set out to learn about ocean currents. Retracing the 
journey of the plastic ducks, frogs and turtles across the Pacific, he 
reveals how floating markers have been used to map the circulation 
of the seas. And he questions the globalized economic system that 
sends cheap novelty products on such odysseys in the first place. 


Driven to Extinction: The Impact of Climate Change on Biodiversity 


=e Richard Pearson STERLING 264 pp. $22.95 (2011) 

D RIVEN Global warming will result in winners and losers among species, 

EXTIN explains Richard Pearson, a biogeographer at the American 
CTION Museum of Natural History in New York. Offering a balanced 


assessment of case studies of animals and ecosystems that 
are already affected by environmental degradation — such as 
oie Madagascan geckos, coral reefs and polar bears — he relates how 
=e yj climate change will sever links between organisms. This will lead 


to inevitable extinctions, he admits. But new niches will emerge in 
which other species might flourish. 


The Beautiful Invisible: Creativity, Imagination, and Theoretical 
Physics 

Giovanni Vignale OXFORD UNIVERSITY PRESS 320 pp. $34.95 (2011) 
Physics is much more than just dry mathematics, argues physicist 
Giovanni Vignale. Its abstract concepts, such as energy and atoms, 
are products of the imagination that call for a creative approach, and 
are best viewed as cultural hand-me-downs that have developed 
from philosophical ideas throughout the ages. In his thoughtful 
and wide-ranging book, Vignale explores the esoteric side of the 
discipline, which he sees as “the military academy of liberal arts” 
owing to its mix of rigour and creativity. 


The Kaguya Lunar Atlas: The Moon in High Resolution 

Motomaro Shirao and Charles A. Wood SPRINGER 174 pp. $39.95 
(2011) 

Lunar landscapes take on a new realism in this atlas of photographs 
taken by the high-definition television camera aboard the Kaguya 
(SELENE) spacecraft, operated by the Japanese space agency 
JAXA. The oblique views, snapped by the low-flying probe from just 
100 kilometres above the Moon’s surface, show the terrain as it 
would be seen by astronauts descending to its surface, rather than 
the vertical views presented by other satellites. 
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Otzi the Iceman has been ‘reincarnated’ by palaeontological artists Alfons and Adrie Kennis using forensic findings as well as artistic inspiration. 


ANTHROPOLOGY 


The Iceman defrosted 


Marta Paterlini reports on an exhibition marking 20 years since Otzi, one of the 
world’s oldest natural mummies, was discovered under the Alpine ice. 


s dead celebrities go, Otzi the 
Az must be one of the most 

closely studied — he has been meas- 
ured, X-rayed and dated. But the 5,300-year- 
old mummified corpse, found part-buried in 
ice on the Tisenjoch Pass in the Alps span- 
ning the Italian—Austrian border in 1991, 
still holds surprises. Many of his secrets are 
revealed in Otzi7’, a major exhibition that 
opened this week at the South Tyrol Museum 
of Archaeology in Bolzano, Italy, to mark the 
20th anniversary of his discovery. 

Wounded by an arrowhead in his left 
shoulder, Otzi is thought to have frozen 
to death while fleeing attackers. Much of 
the analysis so far has concentrated on the 
belongings found with him, but this has 
shifted. “So far the attention has been on 
Otzi’s clothes and tools. Now, the physical 
body becomes the focus,’ explains museum 
director Angelika Fleckinger. 

Central to the exhibition is a new recon- 
struction of his body by twin brothers 
Alfons and Adrie Kennis, Dutch palaeonto- 
logical artists who previously put a face to 
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Otzi2" Life, Neanderthal man. The 
Science, Fiction, artists reconstructed 
Reality Otzi’s body by compar- 


South Tyrol Museum of 
Archaeology, Bolzano, 
Italy. 

Until 15 January 2012. 


ing his bone measure- 
ments, such as femur 
length, to those of men 
today. They sculpted 
muscles from modelling clay, attaching them 
to an appropriately sized skeleton. Using a 
polyurethane mould, they crafted a silicone 
torso, adding legs in resin and plastic. The 
model is finished with five thin layers of sili- 
cone ‘skin’, each painted individually. 

The skull was made using accurate three- 
dimensional computerized tomography 
scans of Otzi’s head as a guide. Ultrasound 
measurements of skull morphology and 
average skin and flesh thickness were used 
as the basis for modelling his facial tissues 
— a technique used in forensic medicine to 
reveal injuries. Together with traces of some 
mummified characteristics, “all these data 
gave us an estimate of his portrait, complete 
with wrinkles, hair and eyelashes,” explains 
Adrie Kennis. 
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“What I found peculiar was the small 
nasal cavities,’ says Adrie. This trait, along 
with his fine bones, means that Otzi would 
have looked fragile, he adds. The artists 
also think that he would have appeared 
older than someone in their mid-forties 
today, because his features would have been 
ravaged by greater exposure to the harsher, 
hotter climate of the time. 

The reconstruction team had many dis- 
cussions about the precise moment at which 
to depict him. “We agreed to stage it a day 
before his death, when he is wandering up 
to the mountains, a spark of stress on his 
face.’ Adrie explains. Otzi would have been 
uncomfortable — he was wounded and 
on his own, perhaps being followed. This 
sombre picture contrasts with his smiling 
face in the museums earlier model. 

Even more striking is the colour of Otzi’s 

eyes: not blue, as in the 
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extracted from a sample of pelvis bone. 

When the mummy was defrosted in 
November 2010 for the first time since 
its discovery, researchers found that the 
stomach was filled with matter (previous 
analyses had been limited to the intestine). 
Using histological, morphological, DNA and 
botanical analysis, they aim to determine 
which bacteria Otzi was carrying at the time 
of his death — information they hope will 
improve their conservation strategy and hint 
at his dietary habits. 

Aside from his recent thaw, Otzi is usually 
kept at -6°C and 98% air humidity, and is 
misted with water once a month. The drop- 
lets freeze on the surface of the body, pre- 
serving it in a thin shell of ice. The crystals 
on his skin are visualized in an installation by 
British artist Mariléne Oliver, also on display 
in the exhibition. In Otzi: Frozen, Scanned 
and Plotted (2007), Oliver converted a com- 
puterized tomography scan of the frozen 
body into an image by drilling some 50,000 
holes into 80 acrylic sheets that were then 
stacked into a translucent three-dimensional 
block. The result is a ghostly impression of 
Otzi’s form. 

Otzi*’ embraces the full spectrum of the 
Iceman’s discovery, his life and the media cir- 
cus and scientific sleuthing that has followed. 
With plans to update exhibits throughout 
the year, the show provides a focus for the 
new scientific findings that are contributing 
to the emerging picture of Otzi. = 


Marta Paterlini is a writer based in Stockholm. 
e-mail: martapaterlini@gmail.com 
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Q&A Manolis Papagrigorakis 


Facing the past 


The Athens-based orthodontist explains the art and science of 
reconstructing the heads of long-dead people from their skulls alone, 
including that of Myrtis — a young girl from more than 2,000 years ago, 
whose recreated face is our first glimpse of an ordinary ancient Greek. 


Why did you decide to reconstruct an 
ancient Greek face? 

For 30 years I have been combining my 
science, which deals in the bone struc- 
ture of the lower face, with my hobbies of 
history and art, by studying the craniofacial 
complex of ancient Greeks. When Myrtis’s 
unusually intact skull was discovered, I saw 
it as a great opportunity to reveal what an 
ancient Greek layperson looked like for the 
first time. 


How did you feel when you first saw the 
finished picture of Myrtis? 

It was very emotional to come face to face 
with someone who could have been your 
80 times great-grandmother and at the 
same time your granddaughter, because 
she really resembles today’s 
children. Our detailed 
reconstruction was pub- 
lished in the January 2011 
issue of The Angle Ortho- 
dontist. 


Where were Myrtis’s bones 
found? 

The building of the Athens 
Metro in 1994-95 brought 
to light a mass grave in what 
was once the public cem- 
etery of ancient Athens. 
Archaeologists found at 
least 150 skeletons, appar- 
ently hastily buried. The site 
was dated to 430-426 Bc, 
when Athens was besieged by the Spar- 
tans during the Peloponnesian War and an 
unknown epidemic struck the city. 


How did you become involved in the 
reconstruction? 

The archaeologists asked me to examine 
various bones, which we knew came from 
victims of the mysterious disease. Within 
the tooth pulp of three different skulls, 
we found genes that matched those from 
a bacterium called Salmonella enterica 
serovar Typhi, suggesting that the victims 
died of typhoid fever. 


What drew you to Myrtis’s remains? 
One skull was small, belonging to a child, 
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Myrtis was rebuilt from a skull. 


and I saw something I hadn't seen in the 
other skulls unearthed from the mass grave 
— its jaw bore both permanent teeth and 
part of its deciduous (baby) dentition. The 
morphology of the front part of the lower 
jaw and brow ridge, as well as the size of 
the lower canine teeth, told us the sex. We 
deduced her age using X-rays to look at 
how complete the roots of her teeth were. 
This suggested that the skull belonged to 
an 11-year-old girl, to whom we gave the 
old Greek name Myrtis, meaning myrtle. 


How did you reconstruct her face? 

We placed numerous markers on her skull 
to reflect the average tissue depth across 
the face, according to data tabulated for 
people of various ages and of each gender. 
The Swedish sculptor Oscar 
Nilsson formed 20 anatom- 
ically correct muscles using 
clay, and worked from the 
skull outwards until the 
tissue depth reached the 
markers. He gave her brown 
eyes, taking her Greek 
origin into account. The 
hairstyle and expression 
were decided after studying 
sculptures and depictions of 
children living at the same 
time as Myrtis. 


Which features are the 
hardest to recreate? 

The weak points are the 
ears, the tip of the nose and lips, where 
there is no bone — only soft tissues and 
cartilage that have disappeared. I used 
her dental arch to define the shape and 
position of her lips, and here my special- 
ity helps. The coexistence of her adult and 
baby teeth create the look of an overjet, 
where the top teeth project forwards. 


What would her life have been like? 

We only know that she lived around 430 Bc, 
when many of the values that sustain con- 
temporary civilization were established. 
She probably witnessed the building of the 
Parthenon on the Acropolis in Athens. 


INTERVIEW BY ALISON MCCOOK 
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Curb deep-sea 
mining now 


Cindy Lee Van Dover in her 
review of the Bismarck Sea 
mining project (Nature 470, 
31-33; 2011) accepts the 
inevitability of interest in 
excavating the sediments of 
hydrothermal vents for minerals 
such as copper, zinc, gold and 
silver. Many of the hundreds 
of these sites are accessible, 
and the issue is widely seen as 
not whether mining should 
proceed, but how it can be done 
profitably and safely. 

I approach the issue with a 
strong bias, based on efforts 
over decades to figure out how 
to keep the world working as a 
biophysical system capable of 
serving indefinitely as a human 
habitat. On the overall issue 
Tam not optimistic. On one 
topic, however, I am certain: 
the integrity of the oceanic 
biophysical system is being 
lost now and the human cost is 
overwhelming. 

The fact is that intrusions into 
the global environment have 
passed a limit of acceptability 
and this one must be seen for 
the twofold attack on the global 
commons that it is. 

Hydrothermal vents are 
one of the wonders of Earth: 
communities of autotrophic 
organisms that survive on 
Earth’s energy as opposed to 
photosynthetic energy from 
the Sun, the source of energy 
of almost all other life. Each 
vent site may have its own 
high degree of endemism, 
essentially unique life. The 
mere fact that the sites are 
commercially attractive as ore 
is not an adequate reason to 
exploit them, any more than the 
existence of the giant redwoods 
of the Sierra Nevada justifies 
harvesting them for shingles. 
The vents are a window onto 
the history of life. By what 
right do we destroy them for 
corporate profit? 


Worse, mining of marine 
sediments mobilizes the 
noxious minerals they contain, 
including those that are toxic to 
other marine life. Suddenly, we 
have another contribution to 
the chemical disruption of the 
ocean. We are well aware of the 
process and its effects. Do we 
need more research to confirm 
our experience? 

Global problems all have 
local origins. Here we have the 
beginning of another process we 
shall never be able to stop, once 
started. Another mountain-top 
mining. Another Tar Sands of 
Alberta. Another North Slope 
oil development. Scientists 
who join the programme are 
offering tacit approval of it, no 
matter what their perspectives. 
The world is too small for this 
further destructive intrusion — 
it should be stopped now before 
it becomes another corporate 
atrocity, too big and too valuable 
to stop. 

George M. Woodwell Woods 
Hole Research Center, USA. 
gmwoodwell@whrc.org 
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NIH plan will hinder 
translational studies 


The proposal by the US 
National Institutes of Health 
(NIH) to dismantle the 
National Center for Research 
Resources (NCRR) (see 
go.nature.com/yw3cq3) is more 
likely to inhibit than enhance 
translational research. 
Through its Division of 
Comparative Medicine (DCM), 
the NCRR has long promoted 
translational research by 
supporting facilities and by 
providing resources and training 
to identify and target disease 
mechanisms. The proposed 
replacement for the NCRR, the 
National Center for Advancing 
Translational Sciences 
(NCATS), acknowledges the 
value of an integrated DCM by 
retaining its core functions as a 
cohesive programme within an 
‘Infrastructure Entity. 
However, in our view, 
the vision of NCATS as an 
incubator for innovative 
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medicines is unrealistic. A 
major obstacle to developing 
new treatments through 
translational science is an 
inadequate understanding of 
basic biological pathways and 
mechanisms — not anaemic 
efforts by industry to test 
potential drug candidates. 
Using the NCRR’s existing 
research resources as a means of 
enhancing the NIH’s traditional 
strength in mechanistic 
research is a more certain route 
to translational success than 
focusing on chemical screening 
and intramural bioassays, as 
proposed for NCATS. 

As veteran comparative 
biologists, we feel that the 
decision to slash the NCRR to 
initiate NCATS was undertaken 
without due diligence or 
sufficient opportunity for 
public debate. The rush to 
establish NCATS without a 
settled organizational plan and 
against the advice of numerous 
translational science researchers 
bodes ill for the new centre’s 
ability to perform meaningful 
translational research in the 
foreseeable future. 

The preservation of the DCM 
in the Infrastructure Entity will 
maintain core NIH translational 
science functions. The sprint to 
form NCATS by dismembering 
the NCRR might be good 
politics, but it is bad public 
policy. 

Brad Bolon on behalf of 25 
co-authors*, GEMpath, 
Colorado, USA. 
brad@gempath.net 

*See http://dx.doi.org/10.1038/ 
471036b for a full list of signatories. 
SEE NEWS P.15 


Neuroscience cuts 
will hurt key areas 


We call on the UK Biotechnology 
and Biological Sciences Research 
Council (BBSRC) to reconsider 
its intention to cut funding for 
neuroscience by around 20% 


(see go.nature.com/u4mgyq and 
go.nature.com/8ig9oy). 

The neuroscience currently 
funded by the BBSRC must 
survive a rigorous committee 
selection process. According to 
the research council, the cut is 
being imposed not because the 
neuroscience funded is less than 
excellent, but because it does not 
address BBSRC priority areas. 

Yet neuroscience research is 
crucial in every BBSRC priority 
area. In public health, it can 
improve the understanding 
of mental illness, age-related 
cognitive decline, and diet and 
exercise factors (through the 
neural basis of food selection and 
motivation, respectively). It can 
improve animal welfare by giving 
insight into the mental state of 
farm animals, and is relevant to 
food security — for example, 
by controlling crop predation 
through knowledge of the neural 
basis of insect behaviour. 

The BBSRC funds so much of 
this research because of the high 
quality of British neuroscience 
and because its researchers have 
consistently proved that they can 
compete for funding. 

So far, the BBSRC has been 
admirably responsive to research 
excellence on the ground, and 
open to going where scientists 
lead. This imposition of funding 
priorities from the top isa 
regrettable change. 

Peter A. McNaughton, Trevor 
W. Robbins University of 
Cambridge, UK. 
pam42@cam.ac.uk 


Easier citizen 
science is better 


Non-scientists are now 
participating in research in 
ways that were previously 
impossible, thanks to more 
web-based projects to collect 
and analyse data. Here we 
suggest a way to encourage 
broader participation while 
increasing the quality of data. 
Participation may be passive, 
as when someone donates 
their computer's ‘downtime’ to 
projects such as SETI@home, 
or active, as when someone 
uses eBird to log birds they 
have spotted. Unfortunately, 


the prevailing data-collection 
and storage practices for active 
projects inhibit participation by 
non-experts. 

Many projects rely on positive 
identification, whether explicitly 
(as for eBird) or by soliciting 
photographs and descriptions 
that others can use to classify 
the observation (as for the UK 
website iSpot). Because non- 
experts often lack the knowledge 
to identify species, they may opt 
not to participate or may provide 
inaccurate data by accidentally 
misidentifying something. The 
result is a trade-off between 
participation and data quality. 

This trade-off can be avoided 
simply by changing the way 
information is collected and 
stored. Participants should be 
given the option to report a 
sighting in terms of observed 
attributes, eliminating the need 
to force a (possibly incorrect) 
classification. For example, 
allowing someone to report a 
bird as oil-covered may be more 
valuable than asking them to 
guess what the species is. For 
such data to be used effectively, 
they need to be stored in a way 
that supports attributes rather 
than fixed, predetermined 
classes. 

Jeffrey Parsons, Roman 
Lukyanenko and Yolanda 
Wiersma Memorial University of 
Newfoundland, Canada. 
jeffreyp@mun.ca 


Include Israel when 
comparing metrics 


Your readers deserve to see 
research metrics from the Arab 
world (Nature 469, 453 and 470, 
147; 2011) compared with those 
of its nearest neighbour, Israel. 
You compare the number of 
publications, researchers per 
million of population and the 
percentage of gross domestic 
product (GDP) expended on 
research and development 
(R&D). But all of your graphics 
omit Israel, even though the 
GDP graphic includes the 
European Union and Turkey. 
The picture would be 
different had Israel’s metrics 
been included. Israel published 
14,943 papers in 2008 (Science 
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Citation Index). In 2007, there 
were 7,841 researchers per 
million population, and civilian 
expenditure on R&D totalled 
4.3% of GDP in 2009 — the 
highest percentage in the world 
(Central Bureau of Statistics, 
State of Israel). 

From 1948, Israel and its 
Arab neighbours started ona 
roughly equal footing. Israel has 
achieved much, despite arguably 
being the poorer nation in terms 
of traditional measures such 
as land area, natural resources 
and freedom from conflict. Its 
strong investment in human 
capital, fostered by a free and 
open society, has produced six 
decades of spectacular growth. 
Those achievements stand 
in contrast to six decades of 
regrettably slow (and relatively 
static) progress in the Arab 
world. Your analysis would have 
been more accurately portrayed 
in this context. 

David W. Borhani Hartsdale, 
New York, USA. 
david. borhani@alum.mit.edu 


Lay aside the 
ladder of descent 


Your argument that the curious 
Xenoturbella flatworm represents 
a “crucial intermediate stage of 
animal evolution” (Nature 470, 
161-162; 2011) perpetuates 

a popular misconception, 
stemming from a presumption 
that the features of such ancient 
living relics are intermediate 
between those of other extant 
creatures. 

Today’s organisms are all at the 
twig tips of one large tree of life, 
with no knowable connections 
between primitive and higher 
forms. Reproductively isolated 
populations of species, such 
as chimpanzees and humans, 
are not modifications ona 
‘ladder’ of descent — thus 
living chimpanzees are not 
our ancestors, but a sister 
species adapted to a different 
habitat (tropical forests versus 
savannah). 

Xenoturbella has largely 
maintained its internal structure 
and body shape over millions of 
years of evolution, during which 
stabilizing selection removed 


descendants that were less-well 
adapted to their environment 
than their parents. 

Such ‘living fossils’ have 
always occupied a narrow 
ecological niche, apparently 
without ever experiencing 
much competition from more 
complex organisms, and 
so may serve as models for 
reconstructing crucial steps in 
animal evolution. But they do 
not represent ‘intermediate’ 
evolutionary forms in the way 
that some of the famous fossils 
from the Mesozoic, such as the 
feathered dinosaurs or ancient 
snakes with hind legs, are viewed 
as earlier, extinct connecting 
links in the tree of life. 

U. Kutschera Institute of Biology, 
University of Kassel, Germany. 
kut@uni-kassel.de 


India needs more 
plant taxonomists 


India, with its wide range of 
geographical and climatic 
conditions, has a rich and 
varied flora of some 45,000 
species — almost 7% of the 
world’s flowering plants. 
But their documentation is 
seriously compromised by 
the country’s dearth of plant 
taxonomists. 

Although DNA sequence 
data and barcoding are well on 
the way to being accepted as 
the global standard for species 
identification, India’s plant 
taxonomists are struggling 
to keep up. A lack of proper 
training and infrastructure 
hampers molecular-systematics 
studies, so the evolutionary 
lineages of most of the 
country’s plants remain poorly 
understood. 

India's many outstanding 
botanists, familiar with 
regional flora, must help 
plant taxonomists to advance 
molecular-systematics studies 
and improve the evolutionary 
understanding of the country’s 
rich biodiversity. 

M. Ajmal Ali King Saud 
University, Saudi Arabia. 
majmalali@rediffmail.com 

R. K. Choudhary Korea 
Research Institute of Bioscience 
and Biotechnology, South Korea. 
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Figure 1 | Many eyes — a shoal of blue-striped snapper. 


COLLECTIVE BEHAVIOUR 


When it pays to share decisions 


Theory suggests that the accuracy of a decision often increases with the number of decision makers, a phenomenon 
exploited by betting agents, Internet search engines and stock markets. Fish also use this ‘wisdom of the crowd’ effect. 


LARISSA CONRADT 


aving trouble making a decision? The 
H reason is that you're probably not sure 

which is the best option. You seldom 
have perfect information, so might make a 
bad choice. Sharing decisions with others can 
help, because several decision makers can pool 
their information, and also eliminate individ- 
ual errors’. Consequently, the risk of making 
a mistake and settling on a bad option often 
decreases with the number of decision makers. 
For example, in court cases, juries consisting 
of several people are supposed to make cor- 
rect decisions more often than can a single 
judge’. In humans, there are numerous exam- 
ples of this phenomenon. In social animals, 
the same principle should apply, but empirical 
demonstrations are rare. 

Writing in Proceedings of the National Acad- 
emy of Sciences, Ward et al.* now show that 
larger shoals of fish not only make more-accu- 
rate decisions than do smaller shoals or single 
fish, but also make these decisions faster. In 
an elegantly designed experiment, combined 
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with theoretical modelling, the authors gave 
mosquitofish, Gambusia holbrooki, a choice 
between a predator-free route and one that led 
past a predator model. A fish was more likely 
to make a correct choice (to avoid the predator 
model) the larger the shoal in which it swam. 
The size of this increase in accuracy was in 
close agreement with theoretical predictions. 
The effect did not arise because large shoals 
were more likely to contain one particularly 
clever ‘expert’ fish, which guided the others. 
In fact, individual fish did not differ much in 
their ability to make correct decisions and, 
moreover, were not even good at it. Thus, the 
authors have demonstrated a genuine ‘wisdom 
of the crowd’ (or, in biological terms, ‘many 
eyes’) effect’ (Fig. 1). 

The increase in decision speed with shoal 
size is especially noteworthy, for two reasons. 
First, we typically expect a trade-off between 
decision accuracy and speed, so that decision 
speed decreases with increasing accuracy and 
vice versa’. This is because more-accurate 
decisions usually require more information, 
and information gathering takes time (but see 
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also ref. 6). Second, we expect decision speed 
to decrease with the number of decision mak- 
ers, because sharing decisions requires com- 
munication between decision makers and it 
seems plausible that this will also take time. 
Nevertheless, Ward and colleagues? found that 
both decision speed and decision accuracy 
increased with the number of decision makers 
(that is, the number of fish in the shoal). 

The reason that larger shoals managed to 
make not only more accurate but also faster 
decisions probably lies in the way information 
is communicated. Fish in shoals often move in 
a self-organized manner, whereby individuals 
react rapidly to the movements of close neigh- 
bours’. Indeed, Ward et al. present convinc- 
ing evidence that such a reaction to spatially 
close companions has a crucial role in the 
mosquitofish choice of route — pairs of fish 
within less than 6 centimetres of each other 
reacted very fast to each other’s movement 
changes; and a fish’s choice of route depended 
significantly on the average choice of close 
companions. 

This simultaneous, self-organized system of 
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‘communication has two important features. 
One is that the speed with which information 
is exchanged is high and hardly decreases with 
the number of fish in the shoal. The other is 
that communication is decentralized: that is, 
information transfer can start from any shoal 
member’. This means that overall decision 
speed depends crucially on the fastest deci- 
sion maker(s) within the shoal. For stochastic 
reasons, a large shoal is more likely than a small 
one to contain a fish that, by chance, detects a 
predator relatively quickly, even if the fish do 
not differ in ‘expertise. 

In short, the higher likelihood of a shoal 
containing a fast decision maker, coupled 
with swift, decentralized information transfer, 
could explain the increase in decision speed 
with shoal size. However, such fast decision 
making usually also involves a cost, namely 
that of an increased risk of false positives’. That 
is, if the fastest decision maker made a mistake 
(and ‘detected’ a predator that did not exist), 
this mistake could also be amplified’, and 
the group might stage a costly ‘escape when 
none was necessary. The experiments of Ward 
and colleagues’ did not allow for such a situa- 
tion — there was always one predator model 
present, and fish could either avoid it (true 
positive) or not (false negative). It remains to 
be seen whether accuracy and speed of deci- 
sion making still increase together if fish are 
faced with a situation in which false positives 
are possible. 

Fast and accurate decision making is highly 
desirable in many walks of life, for humans as 
well as animals. Ward and colleagues’ study 
shows that it can be achieved by sharing deci- 
sions widely and using a self-organized system 
of communication. This is, of course, exactly 
the strategy that has long been exploited by 
Internet search engines, and in this sense the 
mosquitofish of Ward and co-workers’ experi- 
ments are not that dissimilar from Google. 

However, there are three caveats about the 
benefits of decision sharing. First, if the abili- 
ties of potential decision makers vary widely, 
it might still be better to listen to one ‘expert. 
Second, there is the danger of information 
cascades, whereby decision makers no longer 
contribute independent information but 
instead amplify shared misconceptions’. 
Finally, in many decisions, the goals of indi- 
vidual decision makers differ: that is, different 
members of the decision-making group favour 
different outcomes. In sucha context, the shar- 
ing of decisions can have disadvantages as well 
as advantages®. Although sharing might still 
increase the available information, it can also 
hand influence on the outcome to decision 
makers whose goals differ from your own. To 
date, surprisingly little is known about good 
decision-making strategies in these kinds of 
conflict situations. = 


Larissa Conradt is in the Department of 
Biology, School of Life Sciences, University of 


Sussex, Falmer, Brighton BN1 9QG, UK. 
e-mail: l.conradt@sussex.ac.uk 
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Atoms playing dress-up 


The idea of using ultracold atoms to simulate the behaviour of electrons in new 
kinds of quantum systems — from topological insulators to exotic superfluids and 
superconductors — is a step closer to becoming a reality. SEE LETTER P.83 


MICHAEL CHAPMAN & CARLOS SA DE MELO 


uring the past decade or so, physi- 
D cists have been trying to implement 

one of the last of Richard Feynman’s 
ingenious ideas. This was to build a ‘quantum 
scale model”, using controllable quantum par- 
ticles, to simulate the workings of otherwise 
intractable quantum systems and to investigate 
thorny problems in condensed-matter physics. 
On page 83 of this issue, Lin and colleagues” 
inch closer to building a new kind of quantum 
simulator using cold gases of atoms. 

At less than one-millionth ofa degree above 
absolute zero, cold atomic gases are extremely 
versatile and can be controlled with great 
precision. They can be composed of bosons 
(particles with integer spin) or fermions (par- 
ticles with half-integer spin). And, just like 
electron gases, they can be confined in a 
variety of environments, including crystalline 
lattices and disordered media. Furthermore, 
the mutual interactions between the atoms of 
a gas can be controlled, by modifying atomic 
collisions, to mimic real, solid-state systems. 
Using these tools, researchers have been able 
to reproduce the essential quantum physics of 
several canonical condensed-matter systems, 
including superfluids, in which particles (elec- 
trons or atoms) move without resistance, and 
insulators, in which particles are pinned to an 
underlying lattice structure. 

However, exploring some of the remain- 
ing uncharted territory in condensed-matter 
physics using cold atomic gases will require 
additional tools. One of the things missing 
from the toolbox had been the ability to mimic 
the effects of magnetic fields on the electron’s 
charge — a challenge because atoms are neu- 
tral. These effects are central to many exotic 
phenomena, including the quantum Hall effect 
and superconductivity. In an earlier study, Lin 
and colleagues demonstrated™ a solution: 
they generated a fictitious magnetic field in an 
atomic system, using tailored beams of light. 
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Now, the same group” adds a new tool to the 
toolbox by creating artificial ‘spin-orbit cou- 
pling’ in a neutral atomic system. But in order 
to understand the significance of this experi- 
mental achievement, let us take a step back and 
understand the concept of spin-orbit coupling. 

In addition to their electronic charge, elec- 
trons (like all fundamental particles) have an 
intrinsic spin. Loosely, we can think of the 
electron as spinning about an axis through its 
centre, with the spinning giving it a magnetic 
character similar to that of a tiny bar magnet. 
Atoms, being composed of fundamental par- 
ticles, also have an intrinsic spin. But how does 
the spin of particles interact with their orbital 
motion? 

The interaction of an object’s spin with 
its orbit (spin-orbit coupling) is ubiquitous 
in both the microscopic and macroscopic 
worlds. One example is the synchronization 
of the rotation (spinning) of the Moon and 
its orbit around Earth, which means that we 
can only see one face of the Moon. Another 
example is the motion of electrons orbiting 
an atom’s nucleus: the motion is altered by 
the spin of the electrons owing to the electric 
field of the nucleus, and this gives rise to the 
atom’s fine structure (small shifts in its energy 
levels). Similar effects occur in free electrons 
moving through electric fields in solids, for 
example the fields generated by the underlying 
crystalline lattice. 

It is hoped that quantum simulators based 
on atomic gases will illuminate the physics 
of electron systems. But it is first necessary 
to devise a technique to make neutral atoms 
mimic the interaction of the spin of moving 
electrons with electric fields, and so engineer 
spin-orbit coupling (Fig. 1). Building ona 
recent theoretical suggestion’ of how this 
might be accomplished, Lin et al.’ were able 
to create experimentally an artificial coupling 
between the spin of rubidium (*’Rb) gas atoms 
(bosons of spin 1) and their centre-of-mass 
motion. To achieve the coupling, the authors 
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Figure 1 | Spin-orbit coupling. a, In an atom, 
an electron (orange) orbits the nucleus (blue; here 
composed ofa single proton). From the electron’s 
point of view, the proton orbits the electron and 
produces a magnetic field that couples with the 
electron’s spin and alters its orbit. b, If the electron 
is roaming freely through a group of ions, from 
the electron’s point of view it is the ions that move. 
The ions’ motion generates a magnetic field that 
couples to the electron’s spin. In real solids, this 
coupling between the electron’s spin and its 
motion (spin-orbit coupling) is more complex, 
but the essence of the interaction is the same as 
that depicted here. Lin and colleagues” engineer 
spin-orbit coupling in a neutral atomic system. 


used a pair of lasers to transfer linear momen- 
tum to the atoms’ centre-of-mass and create 
mixed atomic spin states, which are composed 
of two different spin orientations. The mixed- 
spin states couple directly with the momen- 
tum transferred to the atoms’ centre-of-mass 
(orbital) motion, creating a ‘dressed state’, thus 
leading to an artificial spin-orbit coupling. 
(For a review of related ideas, see ref. 6.) 

A great advantage of the authors’ experi- 
ment’ lies in the possibility of controlling 
spin-orbit coupling — from no coupling 
at all to strong coupling — through optical 
means. If the lasers are turned off, spin-orbit 
coupling disappears: the spin and the centre- 
of-mass motion are independent. If the lasers 
are turned on, spin-orbit coupling occurs and 
scales with the lasers’ intensity. This type of 
control is not typically available in condensed- 
matter systems such as in semiconductors or 
superconductors. 

What’s more, Lin and colleagues” have 
shown that the artificial spin-orbit coupling 
can be used to change the interaction between 
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atoms that are in different spin states. The 
ability to change the interactions between a 
pair of atoms allows the researchers to study 
transitions between a phase in which atoms 
with different spin states repel weakly, and are 
mixed in the same spatial region (lasers off), 
to a phase in which atoms with different spin 
states repel strongly and are spatially separated 
(above a threshold of laser intensity). 

The authors’ creation and control of artifi- 
cial spin-orbit coupling in atoms has implica- 
tions beyond atomic-gas physics, in particular 
because there is no fundamental reason why 
their experiments should not be performed 
with fermions. In condensed-matter systems, 
the spin-orbit coupling of the constituent elec- 
trons (fermions of spin %) can have important 
consequences for semiconductors, supercon- 
ductors and magnetic materials. In mercury 
telluride (HgTe) semiconductors, for exam- 
ple, strong spin-orbit coupling can produce 
topological insulators’. These unconventional 
semiconductors insulate electric current in 
their bulk but conduct electricity on their 
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surface, a rather unusual and peculiar effect 
that may be useful for electronic applications. 
The creation of adjustable artificial spin-orbit 
coupling in atoms opens up exciting possi- 
bilities for realizing quantum simulators 
of topological insulators and exotic forms of 
superfluidity and superconductivity. m 
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Protection from 


the outside 


Protein folding is a high-stakes process, with cell dysfunction and death being 
the unforgiving penalties for failure. Work in bacteria hints that organisms 
manage this process beyond the boundaries of the cytoplasm — and even the cell. 


EVAN T. POWERS & WILLIAM E. BALCH 


one way or another': it can cause both 
loss of function by leading to an insuf- 
ficient amount of functional proteins, and gain 
of toxic function through the aggregation of 
misfolded proteins. Suppressing misfolding 
and aggregation is the job of the proteostasis 
network?*, particularly the various classes 
of chaperones — evolutionarily conserved 
proteins that help other proteins to fold pro- 
ductively. Folding protection must operate 
in many environments, both inside and out- 
side the cell. Writing in Nature Structural and 
Molecular Biology, Quan et al.* identify in bac- 
teria a new structural class of chaperone called 
Spy that, unusually, functions outside the 
typical cellular remit for chaperone activity. 
For their analysis, Quan and colleagues 
created two ‘sandwich fusion proteins’ by 
inserting L53A 154A Im7 — an unstable ver- 
sion of the protein Im7, which is often used 
in protein-folding studies’ — into two other 
proteins: B-lactamase and DsbA. When folded, 
B-lactamase and DsbA confer resistance to 


Pp rotein misfolding can instigate disease 


© 2011 Macmillan Publishers Limited. All rights reserved 


the antibiotic penicillin and to cadmium ions 
(Cd**), respectively. However, the insertion of 
a foreign protein into their sequences makes 
their folding dependent on the folding of the 
inserted protein. Thus, in the sandwich fusion 
proteins, L53A 154A Im7 folding leads to two 
independent selectable markers: penicillin 
resistance and Cd” resistance. 

The authors* induced expression of their 
fusion proteins in the periplasm of the bac- 
terium Escherichia coli; the periplasm is the 
space between the inner and outer membranes 
in Gram-negative bacteria. In most cases, they 
observed no resistance to either penicillin or 
Cd**, presumably because the inability of 
L53A 154A Im7 to fold prevented B-lactamase 
and DsbA from folding. A number of strains, 
however, did gain both penicillin and Cd?* 
resistance. 

The resistant strains also produced a 
massive amount of Spy, suggesting that 
this little-known periplasmic protein had 
a hitherto unrecognized chaperone activ- 
ity. The researchers corroborate this result 
in vitro, showing that Spy can inhibit both 
aggregation and promote folding, even at 


sub-stoichiometric concentrations. 

Quan et al. also show that Spy activity is inde- 
pendent of the cellular energy molecule ATP. 
This is not surprising, given that the protein 
functions outside the cytoplasm. However, 
operation of Spy at sub-stoichiometric concen- 
trations is surprising, because chaperones that 
work in this way generally use ATP®. According 
to conventional wisdom, it is difficult — if not 
impossible — to imagine a mechanism for how 
a chaperone actively remodels the protein-fold- 
ing energy landscape without an energy input. 
It is equally difficult to reconcile Spy’s effects on 
protein folding and aggregation with a simple 
holdase mechanism, in which a chaperone 
passively binds to unfolded proteins. 

There could be several explanations. To 
protect nascent peptides emerging through 
the inner membrane, Spy could work dur- 
ing protein translation, binding transiently 
to nascent proteins to stabilize them. Spy 
could be an efficient protective osmolyte, and 
thus thermodynamically stabilize proteins’ 
native states by promoting the formation of 
hydrogen-bonded secondary structures’, 
which would be consistent with its high lev- 
els in the periplasm. Or Spy could be a steric 
foldase — a type of chaperone that stabi- 
lizes the folded state of proteins by binding 
to them®. Clearly, Spy’s mechanism of action 
merits further investigation. 

The discovery of Spy adds to the current 
repertoire of chaperones functioning in the 
periplasmic space of Gram-negative bac- 
teria’ and raises questions about the existence 
of extra-cytoplasmic, or outer, proteostasis 
networks (the outPN) in complex eukary- 
otes (plants and animals). Whereas the bac- 
terial inner membrane rigorously protects 
the cytoplasm and the intracellular proteo- 
stasis networks (inPN), the outer membrane 
is permeable to small molecules (those with 
a molecular mass of less than roughly 600). It 
functions as a filter to retain periplasmic pro- 
teins close to the surface of E. coli, thus pre- 
venting their dilution in the environment. It is 
perhaps only a modest stretch to compare the 
bacterial periplasmic space to the interstitial 
spaces in vertebrates (Box 1). 

Unfortunately, our knowledge of the com- 
position and function of the outPN in complex 
eukaryotes is limited. Although small amounts 
of the classic chaperones Hsp70 and Hsp90 can 
be found outside the cell under stress condi- 
tions”, their roles remain controversial, and the 
lack of extracellular ATP makes them ill-suited 
to achaperoning role outside the cell. In addi- 
tion, abundant plasma proteins such as albu- 
min and globulins can bind to other proteins, 
but their potential role as outPN components 
remains to be carefully explored. Nonetheless, 
there is evidence for potential outPN players 
that chaperone defective proteins — including 
al-acid glycoprotein", a-1-antitrypsin'*”’, 
asialoglycoproteins”, plasma gelsolin”, clus- 
terins!°, a2-macroglobulins”” and transthyretin, 
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Chaperone networks 


In addition to the intracellular proteostasis 
network (inPN) in its cytoplasm, 
Escherichia coli produces many chaperones® 
— including Spy, identified by Quan et al.4 — 
that protect protein folding in the periplasm 
in an ATP-independent manner (a). 
Mammals have a number of distinct 
interstitial spaces filled with bodily fluids 
that could also operate independently of 
ATP to protect the major organ systems (b). 
However, unlike the periplasmic space of 
E. coli, which is open to the environment, the 
interstitial systems are closed. Interstitial 
fluids ultimately communicate with the 
environment through the kidney filtration 
system, or through uptake and metabolism 
by the liver. 
Plasma (red) provides components of 
the extracellular chaperone network (outPN) 
to the peritoneal (abdomen), pericardial 
(heart), pleural (lungs), synovial (joints) 
and amniotic fluids (for simplicity, all 
grouped in pink). Each might form an 
interstitial system protecting a separate 
organ system, and all have a rich protein 
content, reflecting their passive coupling 


which is thought to be protective against 
Alzheimer’s disease’®. 

Is there an equivalent of stress-related Spy 
induction in humans? At least one possibility 
is the proteins whose levels increase during the 
acute-phase response to inflammation” (such 
as al-acid glycoprotein and haptoglobulin) 
and that have protein-folding protective func- 
tions. Even the innate and adaptive immune 
responses could be seen as highly evolved 
outPN systems (Box 1). 

Undoubtedly, the intracellular proteostasis 
network is conserved and universal””. But the 
observations*” that the seemingly lowly E. coli 
can protect itself from a periplasmic folding 
problem by the production of Spy and other 
non-ATP-dependent chaperones could shift 
our view of the role of the interstitial space 
towards it being a home for a comparable 
extracellular proteostasis network in verte- 
brates”*. Indeed, the outPN in vertebrates 
could report on and manage extracellular 
protein-folding stress, working in parallel with 
inflammatory and immune responses (Box 1). 
After all, like E. coli, vertebrates experience 
stressful situations every day. m 
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Annelid who’s who 


The origin of the annelids is buried in distant evolutionary time. A molecular 
phylogeny resolves their deep family interrelationships and provides a picture of 
their ‘urannelid’ ancestor. SEE LETTER P.95 


DETLEV ARENDT 


scarcely believe we know little about their 
origins. Take earthworms, for example, 
common to everyone's garden, which belong to 
a large phylum of invertebrates — the annelids 
(ringed worms). The hitherto shrouded evolu- 
tionary history of annelids is now illuminated 
by Struck et al. on page 95 of this issue’. 
Annelids are global players in terrestrial 
and freshwater environments, and in marine 
ecosystems, where they live in and on the sea 
floor. But the identity of their nearest relatives 
(maybe molluscs, maybe flatworms), and 
even their affinities within the phylum, has 
remained a puzzle. This lack of understand- 
ing has another edge, given that vari- 
ous annelid species serve as model 
organisms for the investigation of 
basic biological processes. Embryolo- 
gists and neuroscientists have studied 
leeches for decades. And the ragworm 
Platynereis has recently emerged as a 
valuable model for studying develop- 
ment, evolution and neurobiology’, 
along with Capitella, Hydroides and 
other marine species’. Just imagine 
working on vertebrate models such as 
fish, mice and frogs without knowing 
their evolutionary interrelationships. 
Struck et al." have made a signifi- 
cant advance in the reconstruction of 
annelid phylogeny, having resolved 
their internal affinities. Using molecu- 
lar techniques, they have studied the 
relationships between various annelid 
families and orders, and have obtained 
a surprisingly clear result. It turns out 
that annelids are deeply subdivided 
into two main groups, the Errantia (to 
which the ragworm belongs) and the 
Sedentaria (to which the leech, earth- 
worm, Capitella and Hydroides belong). 
As the names suggest, the sub- 
division of annelids into the Errantia 
and Sedentaria matches their over- 
all lifestyles (Fig. 1). Members of the 
Errantia are free to move about, and 
crawl, swim or burrow. Many are 
predators or feed on macroalgae. By 
contrast, representatives of the Sed- 
entaria are hemi-sessile burrowers or 
tube dwellers (apart from the highly 
specialized, parasitic leeches). They 


. ome animals are so familiar that we can 
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eat sediment or surface deposits, or filter the 
surrounding water with their tentacle crowns. 

In 1865, a similar grouping, but excluding 
the earthworms and leeches, was put forward 
by Jean Louis Armand de Quatrefages de Bréau. 
However, it was dismissed by later authors, who 
considered the similarities in lifestyle to be con- 
vergent adaptations due to similar ecological 
constraints. Ironically, Struck and colleagues’ 
study' reveals that the older classification was 
closer to the truth, thus ‘revising the revision. 
The authors’ phylogeny demonstrates that 
broad features of lifestyle and morphology, 
even if sometimes challenging to quantify, 
can be at least as informative as ultrastruc- 
tural or fine morphological characteristics, 
and are not necessarily much more prone to 


Figure 1 | An illustration of the Errantia and Sedentaria by 
Ernst Hiackel, dated 1904. To take just two examples, top right is 
Eunice magnifica (Grube, 1866), Eunicidae, an errantian; top left 
is Sabellastarte spectabilis (Grube, 1878), Sabellidae, a sedentarian. 
Errantians often have especially prominent lateral appendages 
with bristles, for undulatory crawling. Many sedentarians exhibit 
beautiful tentacle crowns for filtering plankton and other food 
particles from the water. 


© 2011 Macmillan Publishers Limited. All rights reserved 


the complication of convergent evolution. 

The work is another success story in the 
young discipline of phylogenomics — which 
attempts to resolve evolutionary history by 
genomic comparisons — and is one of the first 
aimed at probing deep within a phylum. Struck 
and co-workers have sequenced about a thou- 
sand expressed-sequence tags (complemen- 
tary DNA library clones) from 17 members of 
annelid families and complemented these col- 
lections with existing data, yielding a total rep- 
resentation of 34 annelid species. They selected 
231 genes common to at least one-third of the 
total taxa, and aligned and concatenated them 
into a supermatrix of 47,953 amino acids. 

Next, they reconstructed a phylogenetic tree 
using refined methods capable of handling 
the diversity of amino-acid substitution pro- 
cesses in such a supermatrix. Many of the 
nodes in their tree, especially that separating 
the Errantia from the Sedentaria, had remark- 
ably high support values (contrasting with 
those of previous annelid phylogenies based on 
single genes*), making it highly likely that this 
grouping is definitive. 

The case of the annelids exemplifies both the 
beauty and the pitfalls of phylogeny reconstruc- 
tion when applying the principle of parsimony, 
which settles on the tree minimizing gain 
or loss of particular characteristics. At the 
molecular level, this approach has proved 
very powerful, and it has been further 
enhanced by the advent of phylogenom- 
ics. But it is becoming increasingly 
obvious that, on the basis of morphologi- 
cal characteristics alone, there is a serious 
problem: the apparent ease with which 
such characteristics are lost. 

This point is illustrated by a morpho- 
logical parsimony analysis of annelid 
phylogeny” that established a group, 
the Palpata, whose members were 
defined by the presence of specific head 
appendages (palpae) — the implication 
being that other groups without palpae 
had never had them. The new annelid 
phylogeny instead indicates independ- 
ent loss of palpae in errantian and 
sedentarian groups, as was previously 
suggested®” by two of the co-authors of 
the current paper’. This example cor- 
roborates the general idea of frequent 
and independent loss of traits during 
animal evolution. 

Finally, the new work’ nicely illustrates 
how, once the (molecular) phylogeny has 
been solved, matrices of morphological 
characteristics can be used to reconstruct 
the common ancestors of the respective 
groups. Ifa given characteristic is found 
in both branches resulting from a node, 
it must have been present in the common 
ancestor. In this way, we can infer a lot 
about the ‘urannelid, the last common 
ancestor of all annelids. Most signifi- 
cantly, it was an animal richly equipped 
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with sensory organs. It had chemosensory 
nuchal organs and palpae; a pair of two-celled 
larval eyes for phototaxis’; and possibly a pair 
of more elaborate, multicellular adult eyes with 
an alternating arrangement of rhabdomeric 
photoreceptors and shading pigment cells. The 
latter combination is found in extant errantians® 
and in the sipunculans’, which represent an out- 
group to both errantians and sedentarians (see 
Fig. 1 of the paper’). 

The urannelid was segmented, a detail that 
is clear from the nested position of two unseg- 
mented taxa, the echiurans and sipunculans, 
within segmented groups’. This ancestor 
probably lived on the sea floor, using its rela- 
tively complex lateral appendages for undula- 
tory crawling (as seen in today’s Errantia and 
for example in the Spionidae, which lie in the 
basal part of the Sedentaria branch of the tree). 
Given the power of phylogenomics, we might 


CLIMATE CHANGE 


soon know what the urannelid mollusc- or 
flatworm-like relatives looked like in the 
ancient oceans. m 
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Another Antarctic 


rhythm 


Anovel explanation for the long-term temperature record in Antarctic ice cores 
invokes local solar radiation as the driving agent. This proposal will prompt 
palaeoclimate scientists to pause and to go back to basics. SEE LETTER P.91 


KOJI FUJITA 


nalysis of ice cores retrieved from the 

Antarctic and Greenland ice sheets 

is one of the main sources of our 
understanding of past climate. A component 
of that understanding is that, on timescales 
of 20,000 years and more, climate change in 
Antarctica is determined by the amount of 
solar radiation (insolation) reaching high 
northern latitudes in summer. On page 91 
of this issue, Laepple et al.’ call into question 
some of the evidence for that view. 

Precisely dated polar ice cores have allowed 
examination of the ‘bipolar see-saw’ rela- 
tionship of air temperatures between the 
hemispheres on millennial timescales’, as well 
as of longer-term, glacial-interglacial climate 
change paced by variations in Earth's orbit — 
the Milankovitch forcing of ice ages’. In these 
studies, the use of isotopes that are stable in 
water, in the form of the ratios of oxygen and 
deuterium isotopes, is well established. These 
ratios constitute the fundamental proxy meas- 
urements for estimating past temperatures 
from ice cores at both poles”*. 

Because ice cores consist of ice, the stable- 
isotope ratios in the ice stem from those con- 
tained in precipitation (snow, which becomes 
compacted to ice). In other words, if there is 


no precipitation, no isotopic signal remains 
in the ice core. This simple principle has been 
acknowledged in interpreting the Greenland 
ice-core record’. Subsequent studies®” have 
described how changes in the seasonal pat- 
tern of precipitation during glacial-interglacial 
cycles have significantly biased the isotopic 
temperature record in Greenland. But it was 
thought that the effect in Antarctica was prob- 
ably minor because of its comparatively stable 
precipitation seasonality. 

Laepple and co-authors’ apply this idea of 
precipitation seasonality to the Antarctic ice- 
core record. However, they do not deal with 
changes in seasonal patterns, as the previous 
studies did, but instead consider the situation 
in which seasonality is itself unchanging and in 
which snow accumulation over inland Antarc- 
tica is maximal in winter and minimal in sum- 
mer. This seasonality in snowfall has various 
causes, such as the strong radiative cooling that 
induces clear-sky precipitation and increased 
moisture transport in winter, and sublimation 
of ice into water vapour in summer. 

By assuming that this seasonal pattern of 
snow accumulation has persisted throughout 
glacial—interglacial cycles, and that the local air 
temperature has fluctuated according to the 
present-day relationship between tempera- 
ture and insolation, the authors! produce an 
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accumulation-weighted insolation signal as a 
record of temperatures in Antarctic ice cores. 
They find that it has the opposite phase to the 
orbital-precession component (determined 
by long-term changes in the orientation of 
Earth’s rotational axis) of the local summer- 
insolation signal — and so, surprisingly, that 
itis in phase with summer-insolation intensity 
in the Northern Hemisphere. 

If the Antarctic local temperature is deter- 
mined by local insolation, the precession 
component in the ice-core temperature signal 
should be out of phase with Northern Hemi- 
sphere insolation, because the precession 
component is out of phase between the two 
hemispheres. However, the precession com- 
ponent filtered from the isotopic temperature 
record in the Antarctic ice cores is coherent 
and in phase with the Northern Hemisphere 
insolation intensity’ — seemingly support- 
ing the Milankovitch theory, according to 
which southern climate is driven by insolation 
changes at high northern latitudes. 

But does the close phasing necessarily sup- 
port a causal relationship? Perhaps not. Laep- 
ple and co-authors’ have rethought how the 
signals of temperature change are produced. 
Their accumulation-weighted insolation 
record suggests that a precession rhythm syn- 
chronized with — but not caused by — the 
Northern Hemisphere could be generated if 
the local temperature fluctuated in line with 
local insolation conditions in the Southern 
Hemisphere. The unveiling of this ‘pseudo- 
rhythm strikes at the foundation of tempera- 
ture estimates gleaned by analysing isotope 
ratios in ice cores. Does it mean, as Laepple 
et al. suggest, that the evidence from Antarctic 
ice cores cannot be used to support or refute 
the Milankovitch theory? 

This theory is supported not just by tem- 
peratures inferred from Antarctic ice cores, 
but also by sea surface temperatures recorded 
in sediment cores from the Southern Ocean. 
In these cores, the orbital-precession rhythm is 
often found to be in phase with summer inso- 
lation in the Northern Hemisphere and there- 
fore opposing the local summer insolation’. 
The seasonality of snow accumulation does 
not affect sediment processes in the ocean. 
Furthermore, the existence of shorter (millen- 
nial timescale) but strong bipolar see-saw con- 
nections between the two hemispheres implies 
that there are indeed mechanisms for the inter- 
hemispheric propagation of climate signals 
through the ocean and/or atmosphere’. There 
is no reason to believe that such mechanisms 
have not operated over longer timescales. 

A caveat regarding the results themselves 
is that Laepple and colleagues’ insolation- 
based air-temperature estimate shows a 
rather small amplitude (around 0.7 °C peak 
to peak) compared with that derived from 
ice cores (3°C peak to peak). This is prob- 
ably because the authors’ use of local insola- 
tion as the temperature proxy means that they 
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assume zero insolation during winter (polar 
night) throughout glacial—interglacial cycles. 
They themselves acknowledge this point, and 
suggest that other factors not accounted for in 
their approach may explain the discrepancy. 
Nevertheless, we must now consider that 
the orbital-precession rhythm in Antarc- 
tic ice cores can partly be attributed to local 
conditions. In the same way that an ill-fitting 
piece of a jigsaw puzzle can be disconcerting, 
this pseudo-rhythm will be discomfiting to 
those who study palaeoclimate and climate 
dynamics. ‘Is the signal I see really created by 
climate change?’ is a question they will have 


STEM CELLS 


to ask themselves. And they will need to take 
a hard look at the principles on which their 
data are founded. The relationship between 
the isotopes in water and air temperature, for 
instance, is based on geographical (spatial) 
observations only. But its temporal variability 
has not been confirmed at any ice-core drilling 
sites in inland Antarctica, even by observations 
on an annual timescale. Sometimes, in science 
as in life, it is necessary to pause in order to 
make progress. m 


Koji Fujita is in the Graduate School of 
Environmental Studies, Nagoya University, 


The dark side of 
induced pluripotency 


Induced pluripotent stem cells have great therapeutic potential. But genomic and 
epigenomic analyses of these cells generated using current technology reveal 
abnormalities that may affect their safe use. SEE ARTICLES P.58, P.63 & P.68 


MARTIN F. PERA 


generated through the reprogramming of 

differentiated adult cells and can be coaxed 
to develop into a wide range of cell types. They 
therefore have far-reaching potential for use 
in research and in regenerative medicine. 
But the ultimate value of these cells as dis- 
ease models or as sources for transplantation 
therapy will depend on the fidelity of their 
reprogramming to the pluripotent state, and 
on their maintenance of a normal genetic 
and epigenetic (involving aspects other than 
DNA sequence) status. Five recent surveys’, 
including three in this issue’ *, show that the 
reprogramming process and subsequent cul- 
ture of iPSCs in vitro can induce genetic and 
epigenetic abnormalities in these cells. The 
studies raise concerns over the implications 
of such aberrations for future applications 
of iPSCs. 

It has long been known‘ that, during culti- 
vation in vitro, human embryonic stem cells 
(ESCs) can become aneuploid; that is, they 
acquire an abnormal number of chromosomes. 
The new papers have applied various state- 
of-the-art genomic technologies to assess in 
detail the occurrence and frequency of genetic 
and epigenetic defects in both human iPSCs 
and ESCs. 

Hussein et al.' (page 58) studied copy 
number variation (CNV) across the genome 
during iPSC generation, whereas Gore and col- 
leagues” (page 63) looked for point mutations 
in iPSCs using genome-wide sequencing of 
protein-coding regions. Lister et al.’ (page 68) 


Jee pluripotent stem cells (iPSCs) are 
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examined DNA methylation — an epigenetic 
mark — across the genomes of ESCs and 
iPSCs at the single-base level. These studies, 
along with other investigations into changes 
in chromosome numbers* and CNV’ in the 
two kinds of stem cell, lead to the conclu- 
sion that reprogramming and subsequent 
expansion of iPSCs in culture can lead to the 
accumulation of diverse abnormalities at the 
chromosomal, subchromosomal and single- 
base levels. Specifically, three common themes, 
regarding the genetic and epigenetic stability 
of ESCs and iPSCs, emerge. 

First, by several measures, iPSCs display 
more genetic and epigenetic abnormali- 
ties than do ESCs or fibroblasts — the cells 
from which they originated. Chromosomal 
abnormalities appear early during the cul- 
turing of iPSCs°, a phenomenon not gener- 
ally observed in ESCs. Also, the frequency 
of mutations in iPSCs is estimated to be ten 
times higher than in fibroblasts’. And there 
are greater numbers of novel CNVs (CNVs 
not found in the cell of origin or in human 
genomes of comparable background) in iPSCs 
than in ESCs". Similarly, the epigenome of 
iPSCs features incomplete reprogramming 
(with cells retaining epigenetic marks of 
the cell of origin), aberrant methylation of 
CG dinucleotides, and abnormalities in non- 
CG methylation — an epigenetic feature seen 
only in pluripotent cells’. 

Second, the studies show that genetic abnor- 
malities can arise at different stages of iPSC 
generation. Some lesions are inherited from 
the cell used for reprogramming. Gore et al.’ 
employ a particularly sensitive approach to 
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demonstrate that a subset of point mutations 
found in iPSC lines pre-existed in a small 
minority of fibroblasts used for reprogram- 
ming. Other lesions seem to arise early on in 
reprogramming, as mentioned previously. For 
example, Hussein et al.’ found large numbers 
of new CNVs during early passages (subcul- 
tures) following reprogramming, but noted 
that subsequent growth in vitro seemingly 
selected against most of the changes, which 
implies that they are deleterious for the cells 
that bear them. The studies also report changes 
that apparently relate to long-term adaptation 
to cell culture. These include over-represen- 
tation either of the short arm of chromosome 
12 (12p) or of this entire chromosome*”, and 
ofa subregion in the long arm of chromosome 
20 (ref. 5). Both of these changes have been 
observed® in ESC lines, with an increased 
number of 12p being a hallmark of testicular 
germ-cell tumours — the malignant prototype 
of human pluripotent stem cells. 

Third, several of the groups”*” report clues 
to the potential function of the genetic lesions 
that arise in ESCs and iPSCs. For example, 
regions prone to amplification, deletion or 
point mutation seem to be enriched in genes 
involved in cell-cycle regulation and can- 
cer. Although the changes observed do not 
strongly implicate any particular gene func- 
tionally as a target for change during the ampli- 
fication of iPSCs or during their adaptation to 
culture conditions, the frequent association of 
the affected genes with cancer gives cause for 
concern. 

This highly significant body of data’ pro- 
vides a revealing, in-depth portrait of the status 
of the genome and the epigenome during cel- 
lular reprogramming. But it also leaves open 
some fairly challenging questions. 

The studies provide little insight into the 
crucial question of what aspects of the repro- 
gramming methods might predispose the 
cells to the accumulation of recurrent genetic 
or epigenetic lesions. Although recurrence of 
change in specific genomic regions across a 
number of cell lines strongly implies a selec- 
tive process, in several studies the researchers 
noted that there was no obvious correlation 
between the extent of genetic damage in a 


given population of reprogrammed cells and 
the methods used for their reprogramming 
or propagation. Hussein et al.' provide some 
evidence that CNV occurred more frequently 
at sites prone to replication stress. It is not 
clear, however, whether this stress is unique 
to the reprogramming process, or whether it 
would be common to any experimental situ- 
ation in which a cultured cell is subjected to 
strong selection and replication pressures 
in vitro. 

Despite extensive evaluation of recurrent 
genetic change in a vast number of cell lines, 
we are only slightly closer to identifying which 
particular genes within the larger chromo- 
somal regions that are commonly subject to 
duplication in iPSCs and ESCs might be under 
selection. Years of cytogenetic studies of germ- 
cell tumours have also identified large genomic 
regions that are commonly over-represented 
in these cancers, but the identification of the 
specific genes involved in the transforma- 
tion of these pluripotent cells has remained 
elusive’. A possible interpretation of the data 


CLIMATE CHANGE 


on the genetics of germ-cell tumours is that 
multiple genetic regions, or large regulatory 
regions, are crucial to the process of onco- 
genesis in vivo. Perhaps a similar mechanism 
is in play during in vitro adaptation of ESCs 
or iPSCs. 

With regard to evaluating the safety of 
ESCs and iPSCs, a key issue is the biological 
significance of the changes that these stud- 
ies’ report. Clearly, aneuploid cell lines 
would not be used in therapy (although they 
might be useful for research into the basis of 
genetic disorders associated with anomalies in 
chromosome number or other genetic abnor- 
malities). Cell lines bearing mutations of estab- 
lished functional consequence in oncogenes 
or tumour suppressors, or in genes associated 
with Mendelian disorders (those usually due 
to a single gene), could equally not be used 
therapeutically. However, the many sub- 
chromosomal changes, CNVs or point muta- 
tions that are not obviously associated with 
known disease-related genetic abnormali- 
ties pose challenges to interpretation. This is 


Rethinking the sea-ice 
tipping point 


Summer sea-ice extent in the Arctic has decreased greatly during recent decades. 
Simulations of twenty-first-century climate suggest that the ice can recover from 
artificially imposed ice-free summer conditions within a couple of years. 


MARK C. SERREZE 


ill the Arctic’s floating cover of sea 

ice pass a critical threshold, or tip- 

ping point, beyond which a rapid, 
irreversible slide occurs to a seasonally ice-free 
Arctic Ocean? The question is a pertinent one 
bearing on the adaptability of Arctic marine 
life’, how ice loss influences atmospheric cir- 
culation and precipitation patterns within and 
beyond the Arctic’, and prospects for resource 
extraction and marine shipping’. According to 
anew study by Tietsche and colleagues’, and 
other recent work’, concerns over a tipping 
point may be unfounded. 

That the Arctic is moving towards a season- 
ally ice-free state is clear. Over the period of 
satellite observations (1979 onwards), linear 
trends in the decline of sea-ice extent have 
been recorded for all months. The trends are 
smallest in winter and largest in September, 
the end of the melt season. When referenced 
to a 1979-2000 mean, the rate of decline in 
sea-ice extent for September is about 12% per 
decade; Fig. 1). A key driver of this seasonal 
asymmetry in trends is that spring ice cover is 


increasingly dominated by relatively thin ice 
that formed during the previous autumn and 
winter, with less of the generally thicker ice that 
has survived at least one summer-melt period®. 
Because less energy is required to melt out thin 
ice, with other factors equal, the thinner the ice 
in spring, the lower the ice extent at the end 
of summer. Thin spring ice also strengthens 
the seasonal albedo feedback, whereby dark 
(low albedo) open-water areas are exposed to 
the Sun earlier in the melt season, leading to 
stronger seasonal heating of the upper ocean 
that, in turn, helps to melt more ice, exposing 
even more of the dark ocean. 

Concern over a tipping point stemmed from 
a modelling study’ by Holland and colleagues 
published in 2006. They found that, as the 
climate warmed and the spring sea-ice cover 
thinned in response to rising greenhouse-gas 
levels, a strong kick from natural climate vari- 
ability could more easily induce a reduction 
in sea-ice extent sufficiently large to set the 
albedo feedback process into high gear. As a 
result, the path of a general downward trend 
in summer ice cover would be interrupted 
by sudden plunges spanning a decade or 
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because it is unclear how best to assess the 
effects of new genetic lesions on the growth, 
differentiation, tumorigenicity and functional- 
ity of pluripotent stem cells or their differen- 
tiated progeny. High-throughput functional 
genomics will probably be required to answer 
these questions. Pluripotent cells themselves 
will provide the most promising platform for 
such studies. m 
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Figure 1 | September sea-ice extent in the Arctic 
for 1979-2010. Satellite data (blue) show that 
September sea-ice extent is decreasing in the 
Arctic, and that, relative to the 1979-2000 mean, 
the rate of decline is about 12% per decade; green 
line represents the best fit to the satellite data. 
Tietsche and colleagues’ simulations’ indicate that 
the extent can recover from artificially imposed 
ice-free summer conditions within two years. 


more, hastening the slide to a seasonally ice- 
free ocean. The concern was fuelled in 2007 
by a record September minimum in sea-ice 
extent — 23% below the previous record set 
in 2005 — driven by a combination of sev- 
eral decades of sea-ice thinning and a highly 
unusual summer weather pattern. 
Specifically, a combination of especially 
high atmospheric pressure over the Beaufort 
Sea, north of Alaska, in conjunction with low 
pressure over Siberia, drew warm air into the 
Arctic, hastening melt, while at the same time 
helping to transport some of the remaining 
thick ice out of the Arctic into the North Atlan- 
tic Ocean®. Was this the kick initiating a rapid, 
irreversible decline in ice extent? Although 
there was widespread speculation over this 
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50 Years Ago 


The Birds of Borneo. By Bertram E. 
Smythies — This is a very unusual 
bird book. The main body of the 
work (about 460 pages) consists of 
a detailed systematic account of all 
the 549 species of birds that have 
been found in Borneo ... But it is 
the hundred pages that precede this 
excellent treatise that put this book 
ina class apart ... Lord Medway’s 
chapter gives a fascinating account 
of the cave swiftlets, the saliva-built 
nests of which are the edible birds’ 
nests of commerce, and which 
echo-navigate in the darkness of the 
caves where millions congregate to 
breed. [Mr. Tom] Harrison remarks 
that Governments “by some 
complicated zoo-geology, claim 

the guano as a mineral and allow 
extraction (for fertilizer) under 
licence. Thus what comes out of the 
swiftlets’ mouth as spit is succinctly 
dissociated from what comes out of 
the other end”. 

From Nature 4 March 1961 


100 Years Ago 


In the Prussian Diet of February 18, 
Prof. Kirchner ... is reported to 
have said that, during the last few 
weeks, three cases of plague had 
occurred in London, the infection 
being conveyed by ship-rats. This 
statement has been officially denied 
... With regard to rat infection, 
three rats which had probably 
escaped from a ship were examined 
at the London Docks in November 
last, and two of them were found 

to be suffering from plague, but 

at present there is no evidence of 
the existence of a plague epizootic 
among rats in the London Docks 
area. The destruction of rats ... 

is still carried out at the London 
Docks, and careful precautions are 
being taken to prevent rats in ships 
from infected ports from escaping 
ashore, and possibly initiating an 
epizootic among the shore rats. 
From Nature 2 March 1911 
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possibility, the Septembers of 2008 and 2009 
instead saw successively higher sea-ice extent. 

One interpretation of this apparent short- 
term recovery is that the spring ice cover needs 
further thinning for a tipping point to occur’. 
An alternative is that there is no true tipping 
point. Tietsche et al.* do not argue against the 
mainstream view that a seasonally ice-free 
Arctic Ocean is inevitable if greenhouse-gas 
concentrations continue to rise. The issue 
is how we get there — with or without a 
tipping point. 

Tietsche and colleagues performed a series 
of reference simulation runs with a global cli- 
mate model driven by the middle-of-the-road 
Intergovernmental Panel on Climate Change 
A1B greenhouse-gas emissions scenario for 
the twenty-first century. In these simulations, 
the September ice cover typically disappears 
by the year 2070 and beyond. The authors then 
performed perturbation runs, whereby every 
20 years they artificially removed the entire 
sea-ice cover on 1 July. Instead of maintaining 
ice-free conditions, ice extent in September 
recovered to values typical of the reference 
runs within a couple of years, even in the later 
parts of the century. 

The crux is winter. Initially, with ice-free 
summers, the ocean picks up a great deal of 
extra heat, delaying autumn ice growth. If there 
was a tipping point, this summer heat gain 
would lead to ice cover the following spring 
being thin enough to completely melt out over 
the following summer. Instead, so much ocean 
heat is lost during the darkness of the polar 
winter that enough ice grows to survive the 
next summer’s melt. 


MOLECULAR BIOLOGY 


Although the paper by Tietsche and 
colleagues* brings a more optimistic view of 
the Arctic’s future, the troubling interpreta- 
tion from other recent modelling studies is that 
periods of rapid twenty-first-century sea-ice 
loss, hastening the evolution to ice-free sum- 
mers, don't need to be preceded by a critical 
threshold of sea-ice thickness, greenhouse-gas 
concentration or combination of factors that 
lie at the heart of the tipping-point argument’. 
As we move through the coming decades and 
the climate warms, the ice cover will simply 
become more vulnerable to triggers that cause 
rapid loss events. So although the tipping- 
point argument can perhaps be laid to rest, 
we may nevertheless be looking at ice-free 
summers only a few decades from now. = 
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The expanding arena 
of DNA repair 


The protein Sae2 mediates the repair of double-strand breaks in DNA. It emerges 
that Sae2 activity is controlled by both its modification with acetyl groups and its 
degradation by the process of autophagy. SEE ARTICLE P.74 


CATHERINE J. POTENSKI & HANNAH L. KLEIN 


( ven use myriad ways to regulate the 
complex processes involved in their 
function. To control protein activ- 

ity and stability, for example, an oft-used 

mechanism is post-translational modifica- 
tion of the protein. On page 74 of this issue, 

Robert et al.' report one such modification 

that links the seemingly unrelated processes 

of DNA-damage repair and autophagy. Their 
observations simultaneously highlight the 
depth of cellular ingenuity and the immense 
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interconnectedness of biological pathways. 

The authors began by examining the effect 
of a specific post-translational modification 
— protein acetylation, in which an acetyl 
group is added to a protein. They used the 
drug valproic acid (VPA) to inhibit histone 
deacetylase (HDAC) enzymes, thereby caus- 
ing hyperacetylation of histone proteins and 
reduced HDAC activity’. This treatment had 
no effect on cells, but after exposure to various 
DNA-damaging agents, the apparently normal 
VPA-treated cells were unable to activate the 
typical response to DNA damage. 


Robert et al. present several lines of evidence 
to explain the failure of the DNA-damage 
response, including breakdown of the cell- 
cycle checkpoint mechanism. Although mal- 
function of cell-cycle checkpoints could be due 
to mishaps at any step in DNA repair, it is a 
strong indicator ofa failure to properly process 
DNA double-strand breaks (DSBs). Indeed, 
the authors show that VPA-treated cells could 
not correctly repair such breaks. 

Several proteins are responsible for repairing 
DSBs*, among them Sae2, Exol, the MRX/N 
complex, Sgs1 (BLM) and Dna2. Robert et al. 
report a significant reduction in the associa- 
tion of Sae2 and Exol1 with broken DNA ends 
in VPA-treated cells. More intriguingly, the 
cellular levels of the two proteins were severely 
reduced in these cells, which would explain 
the failure of DSB repair. But why would VPA 
treatment affect Sae2 and Exol levels? 

An analysis’ of all cellular proteins modi- 
fied by acetylation of their lysine amino-acid 
residues identified some that mediate DNA 
repair, including Exol. Although Sae2 was 
not among the acetylated proteins identified’, 
the study did point to the possibility that Sae2 
is acetylated, and that its acetylated version is 
unstable. Deacetylation by HDACs would con- 
vert Sae2 to a stable form, but VPA treatment 
inhibits this. 

Indeed, Robert and co-workers’ also find 
that Sae2 can be acetylated and that two 
HDACs — Hdal and Rpd3, which have simi- 
lar functions — promote its deacetylation. 
What's more, depletion of these two HDACs 
had similar effects to VPA treatment, leading 
to negligible Sae2 levels, heightened DNA- 
damage sensitivity and failure to activate 
cell-cycle checkpoints. Consistent with these 
observations, a recent study” showed that 
the lysine deacetylase enzyme SIRT6 posi- 
tively regulates the repair of DSBs through 
deacetylation of CtIP, the mammalian form 
of Sae2. 

But this is only half the story. Why would 
inhibition of deacetylation (in other words, 
acetylation) destabilize Sae2? Considering the 
consequences of treating mammalian cells 
with VPA and other HDAC inhibitors”®”, 
Robert and colleagues propose — and pro- 
vide evidence in yeast — that VPA stimulates 
autophagy, a degradation process that is nor- 
mally linked to the cellular response to star- 
vation and to organelle turnover. The authors 
further show that mutant cells defective in 
autophagy could overcome the inability of 
HDAC mutants to repair DSBs, presumably 
by stabilizing Sae2. They also saw the same 
spectrum of traits in a histone acetyltransferase 
mutant, which could not acetylate Sae2. 

These findings are particularly noteworthy 
because they bring autophagy into the DNA- 
repair network. Autophagy can be triggered 
by inhibition of the enzyme TORI kinase’. 
Indeed, Robert et al. find that inhibiting this 
enzyme with the drug rapamycin induces 


Nuclear pore 


DNA repair 


DSB formation 


VAR VIVR 


1) HDAC F-VPA 


NEWS & VIEWS | RESEARCH | 


Vacuole 


Nucleus 


Figure 1 | Possible role of Sae2 in DNA-damage repair’. Following the formation of a DNA double- 
strand break (DSB), the damaged DNA is moved to the nuclear pore to protect the bulk undamaged 
DNA from unnecessary processing by the DNA-repair machinery. In its deacetylated form, one such 
mediator, Sae2 (S), together with the MRX/N complex, binds to the DNA ends to aid the initiation of 

the end-resection step of repair. Subsequently, Sae2 might be released as a macromolecular complex 

and acetylated by histone acetylases (HATs), which promote its expulsion through the nuclear pore and 
ultimately its degradation by the process of autophagy. Valproic acid (VPA) inhibits Sae2 deacetylation by 
blocking the activity of histone deacetylase (HDAC) enzymes. 


autophagy and results in destabilization 
of Sae2. 

From their results, the authors propose 
the following model (Fig. 1). Severely dam- 
aged DNA — such as DSBs that are difficult 
to repair — might become sequestered at 
the nuclear pores’, possibly to keep the cell’s 
repair enzymes away from the bulk DNA that, 
although undamaged, could contain struc- 
tures such as DNA nicks and gaps as a normal 
consequence of DNA replication. If the repair 
enzymes were close by, they could mistake 
these structures for damage and ‘repair’ them, 
leading to mutations and rearrangements. 

In the case of Sae2, for example, although 
HDACs maintain it in an active state for 
repairing damaged DNA, at some point it will 
become acetylated and be expelled through 
the nuclear pore to the vacuole — the cellular 
site of autophagic degradation in yeast. Thus, 
a potentially dangerous enzyme is eliminated, 
preventing it from unnecessarily repairing rep- 
lication-associated structures such as stalled 
replication forks. It remains unclear whether 
Sae2 is targeted for degradation after the 
completion of DNA repair and whether a 
checkpoint signal is involved. 

Many further questions remain. Do other 
post-translational modifications, such as ubiq- 
uitination, phosphorylation and SUMOyla- 
tion, regulate Sae2? Because of its association 
with autophagy, ubiquitination is most likely 
to play a part. But how does this relate to 
acetylation? 

Also, several proteins that mediate the 
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DNA-damage response — including Cdk1, Ku, 
MRN, Blm and Rfal — are acetylated", Which 
of these are controlled through acetylation and 
autophagic degradation? Are nuclear pores an 
essential component of acetylation-promoted 
autophagy of DNA-repair complexes? Does 
a similar regulatory process occur during 
meiotic cell division, when DSBs abound and 
their repair is carefully choreographed? Is there 
an intranuclear cycle of acetylation—deacetyl- 
ation, or is all acetylated Sae2 targeted for 
degradation? 

These questions must be tackled before 
researchers can embark on exploring how this 
newly identified layer of DNA-damage regula- 
tion can be exploited to find targets for cancer 
therapy — a setting in which cells experience 
more DNA damage than usual. a 
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Has the Earth’s sixth mass extinction 


already arrived? 


Anthony D. Barnosky'?*, Nicholas Matzke', Susumu Tomiya’*’, Guinevere O. U. Wogan’, Brian Swartz’, Tiago B. Quental!+, 
Charles Marshall’, Jenny L. McGuire>**+, Emily L. Lindsey!?, Kaitlin C. Maguireb?, Ben Mersey* & Elizabeth A. Ferrer’? 


Palaeontologists characterize mass extinctions as times when the Earth loses more than three-quarters of its species in a 
geologically short interval, as has happened only five times in the past 540 million years or so. Biologists now suggest that a 
sixth mass extinction may be under way, given the known species losses over the past few centuries and millennia. Here 
we review how differences between fossil and modern data and the addition of recently available palaeontological 
information influence our understanding of the current extinction crisis. Our results confirm that current extinction 
rates are higher than would be expected from the fossil record, highlighting the need for effective conservation measures. 


f the four billion species estimated to have evolved on the Earth 
over the last 3.5 billion years, some 99% are gone’. That shows 
how very common extinction is, but normally it is balanced by 
speciation. The balance wavers such that at several times in life’s history 
extinction rates appear somewhat elevated, but only five times qualify 
for ‘mass extinction’ status: near the end of the Ordovician, Devonian, 
Permian, Triassic and Cretaceous Periods”. These are the ‘Big Five’ 
mass extinctions (two are technically ‘mass depletions’)*. Different 
causes are thought to have precipitated the extinctions (Table 1), and 
the extent of each extinction above the background level varies depend- 
ing on analytical technique*”, but they all stand out in having extinction 
rates spiking higher than in any other geological interval of the last ~540 
million years’ and exhibiting a loss of over 75% of estimated species’. 
Increasingly, scientists are recognizing modern extinctions of species®” 
and populations*’. Documented numbers are likely to be serious under- 
estimates, because most species have not yet been formally described’. 
Such observations suggest that humans are now causing the sixth mass 
extinction’®’*""’, through co-opting resources, fragmenting habitats, 


Table 1 | The ‘Big Five’ mass extinction events 


introducing non-native species, spreading pathogens, killing species 
directly, and changing global climate’*’*””. If so, recovery of biodiversity 
will not occur on any timeframe meaningful to people: evolution of new 
species typically takes at least hundreds of thousands of years*!”’, and 
recovery from mass extinction episodes probably occurs on timescales 
encompassing millions of years”. 

Although there are many definitions of mass extinction and grada- 
tions of extinction intensity*’, here we take a conservative approach to 
assessing the seriousness of the ongoing extinction crisis, by setting a 
high bar for recognizing mass extinction, that is, the extreme diversity 
loss that characterized the very unusual Big Five (Table 1). We find that 
the Earth could reach that extreme within just a few centuries if current 
threats to many species are not alleviated. 


Data disparities 

Only certain kinds of taxa (primarily those with fossilizable hard parts) 
and a restricted subset of the Earth’s biomes (generally in temperate 
latitudes) have data sufficient for direct fossil-to-modern comparisons 


Event Proposed causes 


The Ordovician event®*®° ended ~443 Myr ago; within 3.3 to 
1.9 Myr 57% of genera were lost, an estimated 86% of species. 


The Devonian event*®*®7-7° ended ~359 Myr ago; within 29 to 
2 Myr 35% of genera were lost, an estimated 75% of species. 


he Permian event®*’!-73 ended ~251 Myr ago; within 
.8 Myr to 160 Kyr 56% of genera were lost, an estimated 
6% of species. 


do OnNHA 


he Triassic event’*’”° ended ~200 Myr ago; within 8.3 Myr 
to 600 Kyr 47% of genera were lost, an estimated 80% of 
species. 


The Cretaceous event®®©°.772 ended ~65 Myr ago; within 
2.5 Myr to less than a year 40% of genera were lost, an 
estimated 76% of species. 


Onset of alternating glacial and interglacial episodes; repeated marine transgressions and 
regressions. Uplift and weathering of the Appalachians affecting atmospheric and ocean chemistry. 
Sequestration of CO>. 


Global cooling (followed by global warming), possibly tied to the diversification of land plants, with 
associated weathering, paedogenesis, and the drawdown of global COz. Evidence for widespread 
deep-water anoxia and the spread of anoxic waters by transgressions. Timing and importance of 
bolide impacts still debated. 


Siberian volcanism. Global warming. Spread of deep marine anoxic waters. Elevated H2S and COs 
concentrations in both marine and terrestrial realms. Ocean acidification. Evidence for a bolide 
impact still debated. 


Activity in the Central Atlantic Magmatic Province (CAMP) thought to have elevated atmospheric 
COz levels, which increased global temperatures and led to a calcification crisis in the world oceans. 


A bolide impact in the Yucatan is thought to have led to a global cataclysm and caused rapid cooling. 
Preceding the impact, biota may have been declining owing to a variety of causes: Deccan 
volcanism contemporaneous with global warming; tectonic uplift altering biogeography and 


accelerating erosion, potentially contributing to ocean eutrophication and anoxic episodes. CO2 
spike just before extinction, drop during extinction. 


Myr, million years. Kyr, thousand years. 
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BOX | 
Severe data comparison problems 


Geography 

The fossil record is very patchy, sparsest in upland environments and tropics, but modern global distributions are known for many species. 

A possible comparative technique could be to examine regions or biomes where both fossil and modern data exist—such as the near-shore marine 
realm including coral reefs and terrestrial depositional lowlands (river valleys, coastlines, and lake basins). Currently available databases® could be 
used to identify modern taxa with geographic ranges indicating low fossilization potential and then extract them from the current-extinction equation. 
Taxa available for study 

The fossil record usually includes only species that possess identifiable anatomical hard parts that fossilize well. Theoretically all living species 
could be studied, but in practice extinction analyses often rely on the small subset of species evaluated by the IUCN. Evaluation following IUCN 
procedures places species in one of the following categories: extinct (EX), extinct in the wild (EW), critically endangered (CR), endangered (EN), 
vulnerable (VU), near threatened (NT), least concern (LC), or data deficient (DD, information insufficient to reliably determine extinction risk). Species 
in the EX and EW categories are typically counted as functionally extinct. Those in the CR plus EN plus VU categories are counted as ‘threatened’. 
Assignment to CR, EN or VU is based on how high the risk of extinction is determined to be using five criteria** (roughly, CR probability of extinction 
exceeds 0.50 in ten years or three generations; EN probability of extinction exceeds 0.20 in 20 years or five generations; VU probability of extinction 
exceeds 0.10 over a century”’). 

A possible comparative technique could be to use taxa best known in both fossil and modern records: near-shore marine species with shells, 
lowland terrestrial vertebrates (especially mammals), and some plants. This would require improved assessments of modern bivalves and 
gastropods. Statistical techniques could be used to clarify how a subsample of well-assessed taxa extrapolates to undersampled and/or poorly 
assessed taxa?®. 

Taxonomy 

Analyses of fossils are often done at the level of genus rather than species. When species are identified they are usually based ona morphological 
species concept. This can result in lumping species together that are distinct, or, if incomplete fossil material is used, over-splitting species. For 
modern taxa, analyses are usually done at the level of species, often using a phylogenetic species concept, which probably increases species 
counts relative to morphospecies. 

A possible comparative technique would be to aggregate modern phylogenetic species into morphospecies or genera before comparing with the 
fossil record. 

Assessing extinction 

Fossil extinction is recorded when a taxon permanently disappears from the fossil record and underestimates the actual number of extinctions (and 
number of species) because most taxa have no fossil record. The actual time of extinction almost always postdates the last fossil occurrence. Modern 
extinction is recorded when no further individuals of a species are sighted after appropriate efforts. In the past few decades designation as ‘extinct’ 
usually follows IUCN criteria, which are conservative and likely to underestimate functionally extinct species**. Modern extinction is also 
underestimated because many species are unevaluated or undescribed. 

Apossible comparative technique could be to standardize extinction counts by number of species known per time interval of interest (proportional 
extinction). However, fossil data demonstrate that background rates can vary widely from one taxon to the next?°*8’, so fossil-to-modern extinction 
rate comparisons are most reliably done on a taxon-by-taxon basis, using well-known extant clades that also have a good fossil record. 

Time 

In the fossil record sparse samples of species are discontinuously distributed through vast time spans, from 10° to 10® years. In modern times we 

have relatively dense samples of species over very short time spans of years, decades and centuries. Holocene fossils are becoming increasingly 


available and valuable in linking the present with the past*®°°. 


A possible comparative technique would be to scale proportional extinction relative to the time interval over which extinction is measured. 


(Box 1). Fossils are widely acknowledged to be a biased and incomplete 
sample of past species, but modern data also have important biases that, 
if not accounted for, can influence global extinction estimates. Only a 
tiny fraction (<2.7%) of the approximately 1.9 million named, extant 
species have been formally evaluated for extinction status by the 
International Union for Conservation of Nature (IUCN). These IUCN 
compilations are the best available, but evaluated species represent just a 
few twigs plucked from the enormous number of branches that compose 
the tree of life. Even for clades recorded as 100% evaluated, many species 
still fall into the Data Deficient (DD) category”. Also relevant is that not 
all of the partially evaluated clades have had their species sampled in the 
same way: some are randomly subsampled”, and others are evaluated as 
opportunity arises or because threats seem apparent. Despite the limita- 
tions of both the fossil and modern records, by working around the 
diverse data biases it is possible to avoid errors in extrapolating from 
what we do know to inferring global patterns. Our goal here is to high- 
light some promising approaches (Table 2). 


Defining mass extinctions relative to the Big Five 


Extinction involves both rate and magnitude, which are distinct but 
intimately linked metrics”*. Rate is essentially the number of extinctions 
divided by the time over which the extinctions occurred. One can also 
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derive from this a proportional rate—the fraction of species that have 
gone extinct per unit time. Magnitude is the percentage of species that 
have gone extinct. Mass extinctions were originally diagnosed by rate: 
the pace of extinction appeared to become significantly faster than 
background extinction’. Recent studies suggest that the Devonian and 
Triassic events resulted more from a decrease in origination rates than 
an increase in extinction rates*’. Either way, the standing crop of the 
Earth’s species fell by an estimated 75% or more’. Thus, mass extinction, 
in the conservative palaeontological sense, is when extinction rates 
accelerate relative to origination rates such that over 75% of species 
disappear within a geologically short interval—typically less than 2 million 
years, in some cases much less (see Table 1). Therefore, to document 
where the current extinction episode lies on the mass extinction scale 
defined by the Big Five requires us to know both whether current extinc- 
tion rates are above background rates (and if so, how far above) and how 
closely historic and projected biodiversity losses approach 75% of the 
Earth’s species. 


Background rate comparisons 

Landmark studies'*'*"” that highlighted a modern extinction crisis 
estimated current rates of extinction to be orders of magnitude higher 
than the background rate (Table 2). A useful and widely applied metric 
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Table 2 | Methods of comparing present and past extinctions 


REVIEW 


General method Variations and representative studies References 
Compare currently measured extinction rates to E/MSY*t 7,10, 15, 27, 62 
background rates assessed from fossil record Comparative species duration (estimates species durations to derive an 14 
estimate of extinction rate)*t 
Fuzzy Math*t 44, 80 
Interval-rate standardization (empirical derivation of relationship between _—_‘ This paper 
rate and interval length over which extinction is measured provides context 
for interpreting short-term rates)+ 
Use various modelling techniques, including Compare rate of expected near-term future losses to estimated background 7,10,14,15 
species-area relationships, to assess loss of species extinction rates*+t 
Assess magnitude of past species losses+t 42,45 


Predict magnitude of future losses. Ref. 7 explores several models and 


7,14, 15, 27, 36, 62, 81-84 


provides a range of possible outcomes using different impact storylinestt 


Compare currently measured extinction rates to 
mass-extinction rates 


Use geological data and hypothetical scenarios to bracket the range of 
rates that could have produced past mass extinctions, and compare with 


This paper 


current extinction rates (assumes Big Five mass extinctions were sudden, 
occurring within 500 years, producing a ‘worst-case scenario’ for high rates, 
but with the possible exception of the Cretaceous event, it is unlikely that any 
of the Big Five were this fast) 


Assess extinction in context of long-term clade 
dynamics 


Assess percentage loss of species 


Use molecular phylogenies to estimate extinction rate 


Map projected extinction trajectories onto long-term diversification/ 
extinction trends in well-studied cladest 


Use IUCN lists to assess magnitude or rate of actual and potential species 
losses in well-studied taxat 


This paper 


This paper and refs 6, 7, 10, 
14, 15, 20, 36 and 62 


Calculate background extinction rates from time-corrected molecular 85 
phylogenies of extant species, and compare to modern rates 


Fuzzy Math attempts to account for different biases in fossil and modern samples and uses empirically based fossil background extinction rates as a standard for comparison: 0.25 species per million years for 
marine invertebrates, determined from the ‘kill-curve’ method®®, and 0.21 species*° to 0.46 species®’ per million years for North American mammals, determined from applying maximum-likelihood techniques. 
The molecular phylogenies method assumes that diversification rates are constant through time and can be partitioned into originations and extinctions without evidence from the fossil record. Recent work has 


demonstrated that disentanglement of diversification from extinction rates by this method is difficult, particularly in the absence ofa fossil record, and that extinction rates estimated from molecular phylogenies of 


extant organisms are highly unreliable when diversification rates vary among lineages through time*®**. 


* Comparison of modern short-term rates with fossil long-term rates indicate highly elevated modern rates, but does not take into account interval-rate effect. 
+ Assumes that the relationship between number and kind of species lost in study area can be scaled up to make global projections. 


{Assumes that conclusions from well-studied taxa illustrate general principles. 


has been E/MSY (extinctions per million species-years, as defined in refs 
15 and 27). In this approach, background rates are estimated from fossil 
extinctions that took place in million-year-or-more time bins. For cur- 
rent rates, the proportion of species extinct in a comparatively very short 
time (one to a few centuries) is extrapolated to predict what the rate 
would be over a million years. However, both theory and empirical data 
indicate that extinction rates vary markedly depending on the length of 
time over which they are measured**”. Extrapolating a rate computed 
over a short time, therefore, will probably yield a rate that is either much 
faster or much slower than the average million-year rate, so current rates 
that seem to be elevated need to be interpreted in this light. 

Only recently has it become possible to do this by using palaeontology 
databases*’*! combined with lists of recently extinct species. The most 
complete data set of this kind is for mammals, which verifies the efficacy 
of E/MSY by setting short-interval and long-interval rates in a comparative 
context (Fig. 1). A data gap remains between about one million and about 
50 thousand years because it is not yet possible to date extinctions in that 
time range with adequate precision. Nevertheless, the overall pattern is as 
expected: the maximum E/MSY and its variance increase as measurement 
intervals become shorter. The highest rates are rare but low rates are 
common; in fact, at time intervals of less than a thousand years, the most 
common E/MSY is 0. Three conclusions emerge. (1) The maximum 
observed rates since a thousand years ago (E/MSY ~ 24 in 1,000-year bins 
to E/MSY ~ 693 in 1-year bins) are clearly far above the average fossil rate 
(about E/MSY ~ 1.8), and even above those of the widely recognized late- 
Pleistocene megafaunal diversity crash’*’? (maximum E/MSY ~ 9, red 
data points in Fig. 1). (2) Recent average rates are also too high compared 
to pre-anthropogenic averages: E/MSY increases to over 5 (and rises to 
23) in less-than-50-year time bins. (3) In the scenario where currently 
‘threatened’ species** would ultimately go extinct even in as much as a 
thousand years, the resulting rates would far exceed any reasonable 
estimation of the upper boundary for variation related to interval length. 
The same applies if the extinction scenario is restricted to only ‘critically 
endangered’ species**. This does not imply that we consider all species in 
these categories to be inevitably destined for extinction—simply that in a 
worst-case scenario where that occurred, the extinction rate for mammals 


would far exceed normal background rates. Because our computational 
method maximizes the fossil background rates and minimizes the current 
rates (see Fig. 1 caption), our observation that modern rates are elevated is 
likely to be particularly robust. Moreover, for reasons argued by others”, 
the modern rates we computed probably seriously underestimate current 
E/MSY values. 

Another approach is simply to ask whether it is likely that extinction 
rates could have been as high in many past 500-year intervals as they 
have been in the most recent 500 years. Where adequate data exist, as is 
the case for our mammal example, the answer is clearly no. The mean 
per-million-year fossil rate for mammals we determined (Fig. 1) is about 
1.8 E/MSY. To maintain that million-year average, there could be no 
more than 6.3% of 500-year bins per million years (126 out of a possible 
2,000) with an extinction rate as high as that observed over the past 500 
years (80 extinct of 5,570 species living in 500 years). Million-year 
extinction rates calculated by others, using different techniques, are 
slower: 0.4 extinctions per lineage per million years (a lineage in this 
context is roughly equivalent to a species)*°. To maintain that slower 
million-year average, there could be no more than 1.4% (28 intervals) of 
the 500-year intervals per million years having an extinction rate as high 
as the current 500-year rate. Rates computed for shorter time intervals 
would be even less likely to fall within background levels, for reasons 
noted by ref. 27. 


Magnitude 

Comparisons of percentage loss of species in historical times®*® to the 
percentage loss that characterized each of the Big Five (Fig. 2) need to 
be refined by compensating for many differences between the modern and 
the fossil records*’”°. Seldom taken into account is the effect of using 
different species concepts (Box 1), which potentially inflates the numbers 
of modern species relative to fossil species**°. A second, related caveat is 
that most assessments of fossil diversity are at the level of genus, not 
species***”**1, Fossil species estimates are frequently obtained by calculat- 
ing the species-to-genus ratio determined for well-known groups, then 
extrapolating that ratio to groups for which only genus-level counts exist. 
The over-75% benchmark for mass extinction is obtained in this way’. 


3 MARCH 2011 | VOL 471 | NATURE | 53 


©2011 Macmillan Publishers Limited. All rights reserved 


REVIEW 


10,000 
aAVU 
1,000 i 
avu ACR ‘ ‘ 
VUaA 4EN ts ry 
100 ENA acr be; H :¢ 
P CRA , ss & 
Cenozoic Pleistocene é é ! a , == = 
10 fossils os 73% 
5 -:6 j | a 
. i] 
= ra. bas 
Gi 1 Sa : Extinctions 
i i ! ae since 2010 
04 = Minus bats 
3 and endemics 
0.01 


102 10 1 


10” 10° 108 10 10° 
Time-interval length (years) 


Figure 1 | Relationship between extinction rates and the time interval over 
which the rates were calculated, for mammals. Each small grey datum point 
represents the E/MSY (extinction per million species-years) calculated from 
taxon durations recorded in the Paleobiology Database”? (million-year-or- 
more time bins) or from lists of extant, recently extinct, and Pleistocene species 
compiled from the literature (100,000-year-and-less time bins)******”*”. More 
than 4,600 data points are plotted and cluster on top of each other. Yellow 
shading encompasses the ‘normal’ (non-anthropogenic) range of variance in 
extinction rate that would be expected given different measurement intervals; 
for more than 100,000 years, it is the same as the 95% confidence interval, but 
the fading to the right indicates that the upper boundary of ‘normal’ variance 
becomes uncertain at short time intervals. The short horizontal lines indicate 
the empirically determined mean E/MSY for each time bin. Large coloured dots 
represent the calculated extinction rates since 2010. Red, the end-Pleistocene 
extinction event. Orange, documented historical extinctions averaged (from 
right to left) over the last 1, 30, 50, 70, 100, 500, 1,000 and 5,000 years. Blue, 
attempts to enhance comparability of modern with fossil data by adjusting for 
extinctions of species with very low fossilization potential (such as those with 
very small geographic ranges and bats). For these calculations, ‘extinct’ and 
‘extinct in the wild’ species that had geographic ranges less than 500 km” as 
recorded by the IUCN’, all species restricted to islands of less than 105 km?, and 
bats were excluded from the counts (under-representation of bats as fossils is 
indicated by their composing only about 2.5% of the fossil species count, versus 
around 20% of the modern species count”). Brown triangles represent the 
projections of rates that would result if ‘threatened’ mammals go extinct within 
100, 500 or 1,000 years. The lowest triangle (of each vertical set) indicates the 
rate if only ‘critically endangered’ species were to go extinct (CR), the middle 
triangle indicates the rate if ‘critically endangered’ + ‘endangered’ species were 
to go extinct (EN), and the highest triangle indicates the rate if ‘critically 
endangered’ + ‘endangered’ + ‘vulnerable’ species were to go extinct (VU). To 
produce Fig. 1 we first determined the last-occurrence records of Cenozoic 
mammals from the Paleobiology Database”, and the last occurrences of 
Pleistocene and Holocene mammals from refs 6, 32, 33 and 89-97. We then 
used R-scripts (written by N.M.) to compute total diversity, number of 
extinctions, proportional extinction, and E/MSY (and its mean) for time-bins 
of varying duration. Cenozoic time bins ranged from 25 million to a million 
years. Pleistocene time bins ranged from 100,000 to 5,000 years, and Holocene 
time bins from 5,000 years to a year. For Cenozoic data, the mean E/MSY was 
computed using the average within-bin standing diversity, which was 
calculated by counting all taxa that cross each 100,000-year boundary within a 
million-year bin, then averaging those boundary-crossing counts to compute 
standing diversity for the entire million-year-and-over bin. For modern data, 
the mean was computed using the total standing diversity in each bin (extinct 
plus surviving taxa). This method may overestimate the fossil mean extinction 
rate and underestimate the modern means, so it is a conservative comparison in 
terms of assessing whether modern means are higher. The Cenozoic data are for 
North America and the Pleistocene and Holocene data are for global extinction; 
adequate global Cenozoic data are unavailable. There is no apparent reason to 
suspect that the North American average would differ from the global average 
at the million-year timescale. 
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Figure 2 | Extinction magnitudes of IUCN-assessed taxa° in comparison to 
the 75% mass-extinction benchmark. Numbers next to each icon indicate 
percentage of species. White icons indicate species ‘extinct’ and ‘extinct in the 
wild’ over the past 500 years. Black icons add currently ‘threatened’ species to 
those already ‘extinct’ or ‘extinct in the wild’; the amphibian percentage may be as 
high as 43% (ref. 19). Yellow icons indicate the Big Five species losses: Cretaceous 
+ Devonian, Triassic, Ordovician and Permian (from left to right). Asterisks 
indicate taxa for which very few species (less than 3% for gastropods and bivalves) 
have been assessed; white arrows show where extinction percentages are probably 
inflated (because species perceived to be in peril are often assessed first). The 
number of species known or assessed for each of the groups listed is: Mammalia 
5,490/5,490; Aves (birds) 10,027/10,027; Reptilia 8,855/1,677; Amphibia 6,285/ 
6,285, Actinopterygii 24,000/5,826, Scleractinia (corals) 837/837; Gastropoda 
85,000/2,319; Bivalvia 30,000/310, Cycadopsida 307/307; Coniferopsida 618/618; 
Chondrichthyes 1,044/1,044; and Decapoda 1,867/1,867. 


Potentially valuable comparisons of extinction magnitude could come 
from assessing modern taxonomic groups that are also known from 
exceptionally good fossil records. The best fossil records are for near-shore 
marine invertebrates like gastropods, bivalves and corals, and temperate 
terrestrial mammals, with good information also available for Holocene 
Pacific Island birds”***°"“*. However, better knowledge of understudied 
modern taxa is critically important for developing common metrics for 
modern and fossil groups. For example, some 49% of bivalves went extinct 
during the end-Cretaceous event’’, but only 1% of today’s species have 
even been assessed’, making meaningful comparison difficult. A similar 
problem prevails for gastropods, exacerbated because most modern 
assessments are on terrestrial species, and most fossil data come from 
marine species. Given the daunting challenge of assessing extinction risk 
in every living species, statistical approaches aimed at understanding what 
well sampled taxa tell us about extinction risks in poorly sampled taxa are 
critically important”. 

For a very few groups, modern assessments are close to adequate. 
Scleractinian corals, amphibians, birds and mammals have all known 
species assessed? (Fig. 2), although species counts remain a moving target”’”. 
In these groups, even though the percentage of species extinct in historic 
time is low (zero to 1%), 20-43% of their species and many more of their 
populations are threatened (Fig. 2). Those numbers suggest that we have 
not yet seen the sixth mass extinction, but that we would jump from one- 
quarter to halfway towards it if ‘threatened’ species disappear. 

Given that many clades are undersampled or unevenly sampled, 
magnitude estimates that rely on theoretical predictions rather than 
empirical data become important. Often species-area relationships or 
allied modelling techniques are used to relate species losses to habitat- 
area losses (Table 2). These techniques suggest that future species 
extinctions will be around 21-52%, similar to the magnitudes expressed 
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in Fig. 2, although derived quite differently. Such models may be sensi- 
tive to the particular geographic area, taxa and species-area relationship 
that is employed, and have usually used only modern data. However, 
fossil-to-modern comparisons using species-area methods are now 
becoming possible as online palaeontological databases grow*?*"*. An 
additional, new approach models how much extinction can be expected 
under varying scenarios of human impact’. It suggests a broader range of 
possible future extinction magnitudes than previous studies, although 
all scenarios result in additional biodiversity decline in the twenty-first 
century. 


Combined rate-magnitude comparisons 


Because rate and magnitude are so intimately linked, a critical question is 
whether current rates would produce Big-Five-magnitude mass extinc- 
tions in the same amount of geological time that we think most Big Five 
extinctions spanned (Table 1). The answer is yes (Fig. 3). Current extinc- 
tion rates for mammals, amphibians, birds, and reptiles (Fig. 3, light 
yellow dots on the left), if calculated over the last 500 years (a conserva- 
tively slow rate’’) are faster than (birds, mammals, amphibians, which 
have 100% of species assessed) or as fast as (reptiles, uncertain because 
only 19% of species are assessed) all rates that would have produced the 
Big Five extinctions over hundreds of thousands or millions of years 
(Fig. 3, vertical lines). 

Would rates calculated for historical and near-time prehistoric 
extinctions result in Big-Five-magnitude extinction in the foreseeable 
future—less than a few centuries? Again, taking the 500-year rate as a 
useful basis of comparison, two different hypothetical approaches are 
possible. The first assumes that the Big Five extinctions took place 
suddenly and asks what rates would have produced their estimated 
species losses within 500 years (Fig. 3, coloured dots on the right). 
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Figure 3 | Extinction rate versus extinction magnitude. Vertical lines on the 
right illustrate the range of mass extinction rates (E/MSY) that would produce 
the Big Five extinction magnitudes, as bracketed by the best available data from 
the geological record. The correspondingly coloured dots indicate what the 
extinction rate would have been if the extinctions had happened 
(hypothetically) over only 500 years. On the left, dots connected by lines 
indicate the rate as computed for the past 500 years for vertebrates: light yellow, 
species already extinct; dark yellow, hypothetical extinction of ‘critically 
endangered’ species; orange, hypothetical extinction of all ‘threatened’ species. 
TH: if all ‘threatened’ species became extinct in 100 years, and that rate of 
extinction remained constant, the time to 75% species loss—that is, the sixth 
mass extinction—would be ~240 to 540 years for those vertebrates shown here 
that have been fully assessed (all but reptiles). CR: similarly, if all ‘critically 
endangered’ species became extinct in 100 years, the time to 75% species loss 
would be ~890 to 2,270 years for these fully assessed terrestrial vertebrates. 
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(We emphasize that this is a hypothetical scenario and that we are not 
arguing that all mass extinctions were sudden.) In that scenario, the rates 
for contemporary extinctions (Fig. 3, light yellow dots on the left) are 
slower than the rates that would have produced each of the Big Five 
extinctions in 500 years. However, rates that consider ‘threatened’ species 
as inevitably extinct (Fig. 3, orange dots on the left) are almost as fast as 
the 500-year Big Five rates. Therefore, at least as judged using these 
vertebrate taxa, losing threatened species would signal a mass extinction 
nearly on par with the Big Five. 

A second hypothetical approach asks how many more years it would 
take for current extinction rates to produce species losses equivalent to Big 
Five magnitudes. The answer is that if all ‘threatened’ species became 
extinct within a century, and that rate then continued unabated, terrestrial 
amphibian, bird and mammal extinction would reach Big Five magni- 
tudes in ~240 to 540 years (241.7 years for amphibians, 536.6 years for 
birds, 334.4 years for mammals). Reptiles have so few of their species 
assessed that they are not included in this calculation. If extinction were 
limited to ‘critically endangered’ species over the next century and those 
extinction rates continued, the time until 75% of species were lost per 
group would be 890 years for amphibians, 2,265 years for birds and 
1,519 years for mammals. For scenarios that project extinction of 
‘threatened’ or ‘critically endangered’ species over 500 years instead of 
a century, mass extinction magnitudes would be reached in about 1,200 
to 2,690 years for the ‘threatened’ scenario (1,209 years for amphibians, 
2,683 years for birds and 1,672 years for mammals) or ~4,450 to 11,330 
years for the ‘critically endangered’ scenario (4,452 years for amphi- 
bians, 11,326 years for birds and 7,593 years for mammals). 

This emphasizes that current extinction rates are higher than those that 
caused Big Five extinctions in geological time; they could be severe enough 
to carry extinction magnitudes to the Big Five benchmark in as little as three 
centuries. It also highlights areas for much-needed future research. Among 
major unknowns are (1) whether ‘critically endangered’, ‘endangered’ and 
‘vulnerable’ species will go extinct, (2) whether the current rates we used in 
our calculations will continue, increase or decrease; and (3) how reliably 
extinction rates in well-studied taxa can be extrapolated to other kinds of 
species in other places””°>**, 


The backdrop of diversity dynamics 

Little explored is whether current extinction rates within a clade fall out- 
side expectations when considered in the context of long-term diversity 
dynamics. For example, analyses of cetacean (whales and dolphins) 
extinction and origination rates illustrate that within-clade diversity has 
been declining for the last 5.3 million years, and that that decline is nested 
within an even longer-term decline that began some 14 million years ago. 
Yet, within that context, even if ‘threatened’ genera lasted as long as 
100,000 years before going extinct, the clade would still experience an 
extinction rate that is an order of magnitude higher than anything it has 
experienced during its evolutionary history”. 

The fossil record is also enabling us to interpret better the significance 
of currently observed population distributions and declines. The use of 
ancient DNA, phylochronology and simulations demonstrate that the 
population structure considered ‘normal’ on the current landscape has 
in fact already suffered diversity declines relative to conditions a few 
thousand years ago*”**. Likewise, the fossil record shows that species 
richness and evenness taken as ‘normal’ today are low compared to pre- 
anthropogenic conditions!077°777-5, 


Selectivity 

During times of normal background extinction, the taxa that suffer 
extinction most frequently are characterized by small geographic ranges 
and low population abundance**. However, during times of mass extinc- 
tion, the rules of extinction selectivity can change markedly, so that 
widespread, abundant taxa also go extinct*”**. Large-bodied animals 
and those in certain phylogenetic groups can be particularly hard 
hit?**°**. In that context, the reduction of formerly widespread ranges* 
and disproportionate culling of certain kinds of species*’*’ may be 
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particularly informative in indicating that extinction-selectivity is chan- 
ging into a state characterizing mass extinctions. 


Perfect storms? 


Hypotheses to explain the general phenomenon of mass extinctions 
have emphasized synergies between unusual events***’. Common fea- 
tures of the Big Five (Table 1) suggest that key synergies may involve 
unusual climate dynamics, atmospheric composition and abnormally 
high-intensity ecological stressors that negatively affect many different 
lineages. This does not imply that random accidents like a Cretaceous 
asteroid impact**”’ would not cause devastating extinction on their own, 
only that extinction magnitude would be lower if synergistic stressors 
had not already ‘primed the pump’ of extinction ®. 

More rigorously formulating and testing synergy hypotheses may be 
especially important in assessing sixth mass extinction potential, 
because once again the global stage is set for unusual interactions. 
Existing ecosystems are the legacy of a biotic turnover initiated by the 
onset of glacial-interglacial cycles that began ~2.6 million years ago, 
and evolved primarily in the absence of Homo sapiens. Today, rapidly 
changing atmospheric conditions and warming above typical interglacial 
temperatures as CO, levels continue to rise, habitat fragmentation, pol- 
lution, overfishing and overhunting, invasive species and pathogens (like 
chytrid fungus), and expanding human biomass®”'*”° are all more 
extreme ecological stressors than most living species have previously 
experienced. Without concerted mitigation efforts, such stressors will 
accelerate in the future and thus intensify extinction’”®, especially given 
the feedbacks between individual stressors”®. 


View to the future 


There is considerably more to be learned by applying new methods that 
appropriately adjust for the different kinds of data and timescales inherent 
in the fossil records versus modern records. Future work needs to: (1) 
standardize rate comparisons to adjust for rate measurements over widely 
disparate timescales; (2) standardize magnitude comparisons by using the 
same species (or other taxonomic rank) concepts for modern and fossil 
organisms; (3) standardize taxonomic and geographic comparisons by 
using modern and fossil taxa that have equal fossilization potential; (4) 
assess the extinction risk of modern taxa such as bivalves and gastropods 
that are extremely common in the fossil record but are at present poorly 
assessed; (5) set current extinction observations in the context of long- 
term clade, species-richness, and population dynamics using the fossil 
record and phylogenetic techniques; (6) further explore the relationship 
between extinction selectivity and extinction intensity; and (7) develop 
and test models that posit general conditions required for mass extinction, 
and how those compare with the current state of the Earth. 

Our examination of existing data in these contexts raises two important 
points. First, the recent loss of species is dramatic and serious but does not 
yet qualify as a mass extinction in the palaeontological sense of the Big 
Five. In historic times we have actually lost only a few per cent of assessed 
species (though we have no way of knowing how many species we have 
lost that had never been described). It is encouraging that there is still 
much of the world’s biodiversity left to save, but daunting that doing so 
will require the reversal of many dire and escalating threats’?°*'. 

The second point is particularly important. Even taking into account 
the difficulties of comparing the fossil and modern records, and applying 
conservative comparative methods that favour minimizing the differ- 
ences between fossil and modern extinction metrics, there are clear indi- 
cations that losing species now in the ‘critically endangered’ category 
would propel the world to a state of mass extinction that has previously 
been seen only five times in about 540 million years. Additional losses of 
species in the ‘endangered’ and ‘vulnerable’ categories could accomplish 
the sixth mass extinction in just a few centuries. It may be of particular 
concern that this extinction trajectory would play out under conditions 
that resemble the ‘perfect storm’ that coincided with past mass extinc- 
tions: multiple, atypical high-intensity ecological stressors, including 
rapid, unusual climate change and highly elevated atmospheric COb. 
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The huge difference between where we are now, and where we could 
easily be within a few generations, reveals the urgency of relieving the 
pressures that are pushing today’s species towards extinction. 
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Copy number variation and selection 
during reprogramming to pluripotency 


Samer M. Hussein!**, Nizar N. Batada®*, Sanna Vuoristo, Reagan W. Ching’, Reija Autio®®, Elisa Narva°, Siemon Ne”, 
Michel Sourour', Riikka Haimialdinen’”, Cia Olsson’, Karolina Lundin’, Milla Mikkola’, Ras Trokovic’, Michael Peitz’, 
Oliver Briistle’, David P. Bazett-Jones*, Kari Alitalo®, Riitta Lahesmaa”, Andras Nagy? & Timo Otonkoski?!” 


The mechanisms underlying the low efficiency of reprogramming somatic cells into induced pluripotent stem (iPS) cells 
are poorly understood. There is a clear need to study whether the reprogramming process itself compromises genomic 
integrity and, through this, the efficiency of iPS cell establishment. Using a high-resolution single nucleotide 
polymorphism array, we compared copy number variations (CNVs) of different passages of human iPS cells with 
their fibroblast cell origins and with human embryonic stem (ES) cells. Here we show that significantly more CNVs 
are present in early-passage human iPS cells than intermediate passage human iPS cells, fibroblasts or human ES cells. 
Most CNVs are formed de novo and generate genetic mosaicism in early-passage human iPS cells. Most of these novel 
CNVs rendered the affected cells at a selective disadvantage. Remarkably, expansion of human iPS cells in culture selects 
rapidly against mutated cells, driving the lines towards a genetic state resembling human ES cells. 


Reprogramming somatic cells to pluripotency can be achieved by 
forced expression of a defined set of factors'”. Several methods have 
been developed for generating human iPS cells, such as retroviral 
transduction’, DNA-transposition-based systems**, transient plasmid 
delivery’ and integration/plasmid-free systems®’. To improve effi- 
ciency and in an effort to understand the process of reprogramming, 
several groups have demonstrated that modulating key components of 
the cell cycle, such as repression of the Ink4a/Arflocus or downregula- 
tion of the p53-p21 pathway, have marked positive effects on repro- 
gramming efficiency**. However, p53 suppression can lead to 
increased levels of DNA damage and genomic instability. These find- 
ings suggest that the reprogramming process places a heavy burden on 
cellular integrity and highlight the importance of further exploring the 
nature of the DNA damage that is associated with the reprogramming 
process. 


High CNV levels in early-passage human iPS cells 


To determine whether reprogramming is associated with de novo- 
generated CNVs, we used the Affymetrix SNP array 6.0 to characterize 
22 human iPS cell lines along with 17 human ES cell lines'’, as well as 
three parental and one unrelated fibroblast lines as controls 
(Supplementary Table 1). The human iPS cell lines were established 
either by retroviral’ or piggyBac’* gene delivery methods and con- 
firmed as human iPS cells using established criteria’* (Supplemen- 
tary Figs 1-3 and Supplementary Table 2). Nine of the 22 human 
iPS cell lines were characterized at more than one passage to track 
CNVs during propagation. 

The median number of CNVs in human iPS cell lines (109) was 
about twofold higher than in human ES cell lines (55) and fibroblasts 
(53) (Supplementary Fig. 4a and Supplementary Tables 3 and 4). We 
found that the majority of CNVs (52.4%) in human iPS cells were not 


present in either human ES cells or fibroblasts (Supplementary Fig. 4b). 
Interestingly, the number of CNVs negatively correlated with the 
passage number. This was surprising because fibroblasts and human 
ES cells showed no significant changes during intermediate length 
passaging (Supplementary Fig. 4c, d). Both the number and the total 
size of CNVs in human iPS cell lines decreased during propagation 
(Fig. laand Supplementary Fig. 4e). Neither the reprogramming factor 
delivery method, fibroblast source or viral integration sites nor the 
presence or absence of Myc during reprogramming (Fig. 1b, cand Sup- 
plementary Fig. 5) influenced these results. This trend was verified in 
an independent data set on human iPS cell lines derived from four 
adult skin fibroblasts (Supplementary Table 5), as well as within indi- 
vidual human iPS cell lines analysed at early and later passages 
(Fig. 1b-d). Our findings indicate that CNVs are generated during 
the reprogramming process. 


Genetic mosaicism in human iPS cells 


The decrease in CNVs during passaging could be explained either by 
DNA repair mechanisms or by mosaicism followed by selection. We 
propose that DNA repair may not be efficient enough to explain the 
rapid decrease in CNVs but, instead, that de novo-generated CNVs 
create mosaicism, which is followed by selection favouring less damaged 
cells during propagation. To obtain direct proof for mosaicism, we 
established new human iPS cell lines and tested these at very early 
passages (passage 2 and 3) for CNVs by using fluorescence in situ 
hybridization (FISH). We chose a probe that maps to a locus on chro- 
mosome 1 that, according to our single nucleotide polymorphism 
(SNP) array data, is frequently affected in human iPS cell lines 
(Fig. 2a). A control probe was selected from a chromosome 1 location 
that showed normal copy number (2) across all human iPS cell lines 
that were tested. During early passages, the test probe demonstrated a 
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Figure 1 | High mutation level in human iPS cells is reduced through 


moderate culture. a, Number of CNVs in human iPS cell lines with respect to 
passage number. Each data point represents a sample: blue, retrovirus-derived 
human iPS cell lines; orange, piggyBac lines. Spearman’s rank correlation 
coefficient (¢) and Student’s t-distribution were used for statistical analysis and 
P-value calculations. b, c, With passaging, both retrovirus-derived (b) and 
piggyBac-transposon-derived (c) human iPS cell lines (listed) show a constant 
and sharp decrease in the number of CNVs. Numbers in parentheses indicate the 
number of factors used for generating the corresponding human iPS cell lines: 3F, 
OCT4, SOX2 and KLF4; 4F, OCT4, SOX2, NANOG and LIN28. d, Genomic 
representation demonstrating the sharp decline in the number of CNVs from 
early passage (passage 5) to intermediate passage (8) and intermediate-late 
passage (12) of the human iPS cell line FiPS7-5 relative to the CNVs in the 
parental fibroblast (F). Blue triangles represent amplifications, and red diamonds 
represent deletions, with colour intensity varying with passage number. 


significantly higher fraction of cells with aberrant copy number state 
than the control probe (Fig. 2a). The fraction of aberrant cells was also 
significantly higher in early-passage human iPS cells (18%) than in 
fibroblasts (3%) or in later-passage human iPS cells (9%) (Fig. 2b, Sup- 
plementary Fig. 6 and Supplementary Table 6). 

To provide evidence for selection, we focused on regions containing 
homozygous deletions, which DNA repair mechanisms cannot correct. 
Although we could detect only a small number of such deletions, our 
detection rate for homozygous deletions was very reproducible, detect- 
ing 98% of the deletions in three to four replicates (Supplementary 
Table 7). Our false discovery rate was 9.7% for detecting other types 
of CNV (Supplementary Table 7 and Supplementary Fig. 7), suggesting 
low error in calling CNVs and robust detection of homozygous dele- 
tions. We focused on homozygous deletions found only in human iPS 
cell lines and not their parental fibroblasts, and we categorized these 
into three groups: type ‘A’ homozygous deletions, which are present 
only in early passages; type “B’ homozygous deletions, which are 
detected only in later passages; and type ‘C’ homozygous deletions, 
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Figure 2 | Increased mosaicism in early-passage human iPS cells. a, Merged 
FISH field images of human foreskin fibroblast (HFF) cells at passage 10 (p10) 
and the human iPS cell line RV2-1 at p3 and p8. Test (green) and control (red) 
probes are located on chromosome 1. The pie charts below the images show the 
percentage of cells that have only aberrant test probe (but not control probe) foci 
counts or copy number (green), cells that have aberrant test and control probe 
foci counts (red), and normal cells (blue). Five different field images were 
counted per sample; n = the total number of cells counted. Dashed grey lines 
indicate normal cells, and orange lines indicate aberrant cells. Scale bar, 10 tum. 
b, Histogram demonstrates the mean fraction of aberrant cells in fibroblasts 
(HFF p10) and early-passage (p2-3) and intermediate-passage (p6-8) human 
iPS cells. Each field contained approximately 20-100 cells; n = total number of 
fields counted. Error bars, s.e.m. One-way analysis of variance and the Tukey- 
Kramer post-hoc test were used for statistical analysis and P-value calculations. 
c, Three categories of homozygous deletions: type ‘A’, detected only during early 
passages; type ‘B’, appearing in later passages; and type ‘C’, seen in both early and 
intermediate passages. —/—, homozygous deletion (red band); +/+, normal 
copy number (blue band). d, Left, non-parental homozygous deletions present in 
five cell lines passaged from an early passage to an intermediate passage. Each 
circle represents a homozygous deletion: “A’, red circles; “B’, orange circles; and 
‘C, black circles. Right, combined total count of homozygous deletions. 


which remain during passaging (Fig. 2c). Five of the cell lines presented 
with non-parental homozygous deletions at an early or intermediate 
passage (Fig. 2d). In four of the lines, we identified homozygous dele- 
tions that were selected against during passaging (type A). We also 
found type B and type C deletions, suggesting that selection pressure 
is bidirectional, selecting both for and against CNVs (Fig. 2d). 


Novel CNVs in early-passage human iPS cells 


We obtained a list of 6,596 non-overlapping common CNVs iden- 
tified in 270 healthy individuals from two combined studies in the 
HapMap Project'*’®. These common CNVs could be considered to be 
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functionally the most neutral'”'®. We designated the set of CNVs that 
were identified in our study and that do not belong to common CNVs 
as novel CNVs. They accounted for 15% of the total CNVs in fibro- 
blasts and 25% in human ES cells but 37% in human iPS cells 
(Supplementary Fig. 8a). The novel CNV fraction was significantly 
higher in early-passage human iPS cell lines than in later passages, in 
which it decreased to levels similar to those found in human ES cells 
and in fibroblasts (Supplementary Fig. 8b). Only a minority of non- 
parental human iPS cell CNVs that overlapped human ES cell CNVs 
were novel (Supplementary Fig. 8c). 


Selection against highly damaged human iPS cells 

In a mosaic population, CNVs can be identified with SNP arrays only 
if they are present above the detection threshold level, which depends 
on the type and the size of CNVs. For example, with the Affymetrix 
SNP array 6.0 and Genotyping Console 3.0 software algorithm, a 
trisomy might not be fully detectable if the mutant cell contribution 
is less than 40% (refs 13, 19). Owing to the multiple probe-based and 
threshold-based nature of CNV identification, a single large CNV 
found within a subpopulation of the cells (for example, type L cells 
in Fig. 3a) could be misrepresented as multiple small, consecutive 
CNVs, providing a false representation of the data. If the CNV is 
selected for during maintenance, these type L cells will become more 
prevalent in the population. Consequently, the detection of this large 
CNV would be more accurate in intermediate passages (that is, a 
relatively larger size and number of overlapping CNVs within the 
intermediate-passage cells would be shared with early-passage cells), 
and the number of ‘false’ consecutive CNVs would decrease. In the 
case of selection against a mutation, the CNV number would still 
show a decrease with passaging, but this mechanism would not affect 


the size of overlapping CNVs between early and intermediate passages. 
The overlap in CNVs between early and intermediate passages would 
be minimal. We therefore investigated whether this putative error 
component could account for the observed high number of novel 
CNVs in early-passage lines (Fig. 3a). 

Focusing on novel CNVs present only in human iPS cells, we found 
that the overlapping CNV size was equivalent among different passages 
and clones (Fig. 3b), but the number of overlapping (shared) novel 
CNVs in human iPS cells was relatively small (Fig. 3c), indicating that 
most CNVs in early-passage lines are not the product of type L cell 
subpopulations and are indeed ‘true’ CNVs. Moreover, the rate of 
selection against novel CNVs in a passage interval (change in novel 
CNV number divided by number of passages) was significantly higher 
between the relatively early passages and the intermediate passages 
than it was between the intermediate and late passages. The latter rate 
was comparable to the selection rate of human ES cells (Fig. 3d), sug- 
gesting that early-passage human iPS cells endure strong selection 
pressure and lose the majority of their de novo mutations. These results 
demonstrate that most of the novel CNVs in human iPS cells are 
generated during the reprogramming process. 


Novel CNVs recur within fragile regions 

To investigate possible sources of negative selection, we asked whether 
the high level of de novo mutations led to functional consequences, 
such as an increase in senescence or apoptosis or a decrease in self- 
renewal. We assessed mutations within genes that may affect differ- 
entiation, proliferation or maintenance of pluripotency. In early pas- 
sages but not in later passages, several deletions were found in genes 
and regions essential for maintaining an undifferentiated state (Sup- 
plementary Table 8). Such mutations included deletions in the genes 
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Figure 3 | High selection pressure against de novo mutations suggests 
reprogramming as the source of novel CNVs. a, Possible error in SNP array 
output from a mosaic human iPS cell colony containing type L cells (cells with a 
large CNV) and type Sh cells (cells with shared novel CNVs). b, Average 
overlapping novel CNV size (kilobases, kb) in human iPS cell lines at early, 
intermediate and late passage. Error bars, s.e.m. ¢, Left, plot showing the change 
in the number of novel CNVs in human iPS cells with passaging. Blue circle size 
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corresponds to the number of CNVs observed at the passage tested, with the 
number listed next to the circle. Right, the number of overlapping CNVs. Values 
for early (E) compared with intermediate (I) or late (L) passage are shown in 
red, and for I compared with L are shown in grey. d, Selection rate of human iPS 
cells. The rate is calculated as change in novel CNV number divided by change 
in passage number. Four human ES cell lines were used as controls. Arrows 
indicate the start and the end of the passage range tested for each sample. 
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encoding the epidermal growth factor receptor, fibroblast growth factor 
receptor 2, B-catenin (also known as CTNNB1) and polycomb-bound 
regions, all of which have been implicated in human ES cell main- 
tenance*”*. We also found that six early-passage human iPS cell lines 
had deletions in the regions encoding the microRNAs let-7c and miR- 
125b, which affect the expression of genes known to be involved in 
human ES cell differentiation and maintenance, such as those encoding 
Myc, Ras, p53 and ERBB3 (refs 24-27). 

To explore possible mechanisms behind reprogramming-induced 
CNVs, we investigated mutations reminiscent of those induced by 
replication stress, such as DNA replication fork stalling and collapse, 
and we assessed CNVs found in regions of genomic fragility, such as 
common fragile sites (CFSs) and subtelomeric regions***°. CFSs con- 
tain late-replicating sequences and are a major target for genomic 
rearrangements in oncogene-expressing and pre-neoplastic cells***. 
We compiled a list of CFSs from published reports (Supplementary 
Table 9) and measured the fraction of recurring deletions within CFS 
regions compared with the whole genome. Deletions recurred more 
frequently in CFSs than in the generic part of the genome, and more 
specifically they recurred more frequently in human iPS cells than in 
human ES cells and fibroblasts (Fig. 4a and Supplementary Table 9). 
Furthermore, this recurring CNV fraction consisted mainly of novel 
CNVs in human iPS cells (Fig. 4a), suggesting that a higher level of 
novel CNVs may result in part from replication stress**”. This obser- 
vation was consistent with previous reports demonstrating increases in 
the level of reactive oxygen species during reprogramming. Reactive 
oxygen species are prevalent in cells undergoing replication stress and 
may contribute to the incidence of mutations in other parts of the 
genome as well”. 

To examine mutations that correlate with senescent or apoptotic 
cells, we focused on deletions incurred in subtelomeric regions 
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Figure 4 | Frequent mutations in fragile genomic regions influence 
selection during the expansion of human iPS cells. a, Recurring deletions as a 
proportion of total CNV deletions and novel CNV deletions, within either CFSs 
or the whole genome. Deletions are considered to be recurring if they are found 
in more than one sample ( = the total number of deletions observed). For 
human iPS cells, only non-parental deletions are considered. The chi-squared 
test was used for statistical analysis and P-value calculations. b, Summary 
model illustrating the increase in the number of CNVs that results from 
replication stress during reprogramming, followed by a selection phase that 
occurs after reprogramming and eliminates unstable human iPS cells 
containing high numbers of CNVs. 
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because these areas have been shown to be highly sensitive to DNA 
double strand breaks**”’ and because deletions within these regions 
are a major cause of chromosomal instability’. We compared the 
average deletion size with those seen in CFSs and the whole genome. 
The average deletion size within subtelomeric regions was signifi- 
cantly larger in early-passage lines than in later passages, while 
remaining unchanged in the generic part of the genome and in 
CFSs (Supplementary Fig. 9a). We also found that several early- 
passage human iPS cell lines had deletions in the subtelomeric region 
nearest to the telomeres (25 kilobases away) (Supplementary Fig. 9b). 
The increased selection against large subtelomeric deletions is con- 
sistent with the idea that CNVs in these areas probably lead to a higher 
level of phenotypic change because these areas are gene rich and prone 
to genomic instability*””. 


Discussion 


Two recent studies***' report on the observation of specific genomic 
aberrations associated with the pluripotent state in human ES cells 
and human iPS cells. One group carried out a meta-analysis of large 
numbers of gene expression profiles that had been determined for 
pluripotent stem cells by different laboratories’. They showed that 
human iPS cells are subject to the type of culture adaptations that have 
been shown to affect the karyotypic integrity of human ES cells”. 
Their data also suggest that a distinct category of genomic aberrations 
may be associated with the early phase of human iPS cell establish- 
ment. Their conclusion is in line with the second recent report, in 
which SNP arrays were used to compare CNVs in a large number of 
normal somatic cell lines, human ES cell lines and human iPS cell 
lines*!. Interestingly, human ES cells were found to contain more 
gains, and human iPS cells more deletions, than somatic cell samples. 
This finding further substantiates the differences between these two 
types of pluripotent cell. It also underscores the differences in the 
selection forces that affect human ES cells and human iPS cells (at 
least during the establishment period) and that could affect the quality 
of the final products. Data from the second study”' suggest that the 
reprogramming process is associated with selection for deletions that 
affect tumour-suppressor genes, whereas maintenance of the cell lines 
selects for duplications in oncogenic genes. 

From our study, we conclude that the reprogramming process is 
associated with high mutation rates, causing increased levels of CNVs 
and genetic mosaicism in the resultant early-passage human iPS cell 
lines. Our data also suggest that de novo CNVs are the consequence of 
replication stress (Fig. 4b). Using our approach, we failed to find 
evidence suggesting that other mechanisms operate. Of the 116 
DNA-repair-related and/or checkpoint-related genes that we investi- 
gated, we found only four cell lines in which a CNV might have 
affected a single gene (Supplementary Table 10). However, because 
our study was limited to CNVs, we could not exclude the possibility of 
other types of mutation that lead to perturbations of checkpoints or 
repair of DNA double strand breaks. Such mutations could lead to 
non-allelic homologous recombination (NAHR)-based rearrange- 
ments and/or non-homologous end-joining (NHE)J)-based rearrange- 
ments. Both NAHR and NHEJ have been reported to be involved in 
CNV formation”. 

In summary, because most de novo mutations confer a growth or 
survival disadvantage to the cells, they are selected against, eventually 
leading to a CNV load similar to that found in human ES cells. This 
negative selection, however, does not exclude the possibility that certain 
hazardous aberrations give the cell a selective advantage over cells with 
an intact genome. Our results highlight the importance of understand- 
ing the molecular mechanisms underlying the reprogramming of so- 
matic cells to a pluripotent state, with particular emphasis on forces that 
negatively affect the integrity of the genome. With a better understand- 
ing of the reprogramming process, we will increase the likelihood of 
finding ways to counteract the pitfalls and create human iPS cells that 
can safely be used for cell-based therapies in the future. 
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METHODS SUMMARY 


Human fibroblast lines were reprogrammed by retroviral transduction’ and 
piggyBac transposition as previously described*. Human iPS cell lines were 
expanded and characterized as previously described". In vitro differentiation of 
human iPS cells was carried out using embryoid body, neuronal and endodermal 
differentiation protocols as described in the Methods. Teratomas were generated 
as described elsewhere**. Supplementary Table 2 lists the details of the character- 
ization of each human iPS cell clone and the factors used for reprogramming. 
Bisulphite sequencing of NANOG and OCT4 promoters was performed as prev- 
iously described*’. Splinkerette PCR was used to identify viral integration sites in 
three human iPS cell lines as previously described*’. FISH protocols are provided 
in the Supplementary Information. Samples were run on Affymetrix SNP array 
6.0, and Genotyping Console 3.0.2 was used to analyse and determine CNV levels, 
genotype calls and loss of heterozygosity detection as detailed in the Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


SNP array 6.0 analysis. Sample handling and hybridization were performed as 
previously described’’. All human ES cell line analysis files, with the exception of 
CAI and CA2, were obtained from a previous study'*. For detecting CNVs and 
genotype calls, the Affymetrix Genotyping Console 3.0.2 and the Birdseed (v2) 
algorithm were used, respectively. CNV locations are based on the human genome 
assembly of March 2006 (NCBI36/hg18). Samples were normalized to 40 
International HapMap samples hybridized on the same platform to decrease tech- 
nical variation (refer to Supplementary Table 11 for HapMap sample codes and 
SNP profiles)’. For CNV calls, regional GC correction, 10-kilobase (kb) size cut-off 
value, and a minimum of ten markers were used as analysis configurations. All of 
the array samples passed quality control requirements, having contrast QC (quality 
control) and MAPD (median absolute pairwise difference) values within the 
boundaries (Supplementary Table 12). All identified CNVs were included, except 
for CNVs spanning centromeric regions (the average marker distribution within 
these regions is too large (>40kb)) and the Y chromosome in female samples, 
which was considered as false positive and excluded from the analysis. R (v2.9.2) 
software and the program Microsoft Excel 2008 (v12.2.3) were used for in silico data 
analysis and CNV data parsing. R and StatPlus for Microsoft Excel (v5.8.3.8) were 
used for statistical analysis and P-value calculations. 

Cell culture. Human fibroblast lines were cultured in 10% FBS (PromoCell) and 
GlutaMAX in DMEM (Gibco). Human iPS cells were cultured on mitotically inac- 
tivated mouse embryonic fibroblasts (MEFs) in KnockOut DMEM supplemented 
with 20% KnockOut Serum Replacement (Gibco), 0.1 mM 2-mercaptoethanol 
(Gibco), 1X GlutaMAX (Gibco), 1X non-essential amino acids (Gibco), 1X ITS 
liquid media supplement (Sigma) and 6 ng ml | FGF2 (Sigma). Human iPS cells 
were passaged using 20 U ml ' type IV collagenase (Gibco), approximately every 
5 days. Human ES cells were cultured and maintained as previously described’. 
Bisulphite sequencing. Bisulphite conversion was carried out on each DNA 
sample (1 1g) using the EpiTect Bisulfite Kit (QIAGEN). OCT4 and NANOG 
promoters were amplified using previously published* bisulphite-specific primers 
(Supplementary Table 13) and a PCR protocol consisting of an initial 1-min 
denaturation step followed by 35 cycles of 95°C for 15s, 54°C for 30s and 
72°C for 45 s. The resultant PCR product was sequenced using either the appro- 
priate forward primer or the reverse primer at the Centre for Applied Genomics 
(Toronto). At CG dinucleotides, cytidine-guanine was scored as methylated CG, 
whereas thymidine-guanine was considered to be an unmethylated CG. 
Ambiguous CGs were scored using control fibroblasts as a methylated reference. 
Splinkerette PCR and quantitative PCR. Genomic DNA was extracted using 
the GenElute Mammalian Genomic DNA Miniprep kit (Sigma). Splinkerette 
PCR was performed as described previously**. Splinkerette primers are listed in 
Supplementary Table 13, and the start position and location of viral integration 
sites are listed in Supplementary Table 14. Total RNA was extracted using a 
NucleoSpin RNA II kit (Macherey-Nagel), with on-column DNase treatment. 
The amount of RNA was quantified using a Nanodrop (NanoDrop Tech- 
nologies), and RNA was separated on 1% agarose gels to check its quality. 
Highly pure RNAs were reverse transcribed using the QuantiTect Reverse 
Transcription Kit (QIAGEN) as per the manufacturer’s protocol. Supplemen- 
tary Table 13 lists all PCR-amplified genes and CNVs and their corresponding 
primers. Annealing temperatures of 55-58 °C were used for most primers. For 
quantitative PCR (Q-PCR), we used LuminoCt SYBR Green qPCR ReadyMix 
(Sigma), a JANUS automated liquid handling robot (PerkinElmer) and the 
CFX384 real-time PCR detection system (Bio-Rad). 

False discovery estimation and CNV validation. The false positive estimate for 
the samples was studied by hybridizing three HapMap samples in four replicates 
(Supplementary Table 7). By using analysis settings identical to those for the main 
data, we found that, on average, 76.2% of total CNV size was detected in all four 
replicates, 15.4% in three, 4.8% in two and 3.6% only in one of the replicates. By 
contrast, for homozygous deletions, no CNVs were detected in only one replicate, 
indicating very low or negligible false positive detection for homozygous dele- 
tions. These values are analogous to those from an earlier study*’. For further 
validation of CNVs, CNVs from three ES cell lines were also confirmed by 
running the same samples on an Illumina Human 610-Quad Chip platform. 
The CNVs from the Illumina data matched 75% (on average) of the CNVs 
observed in the Affymetrix data (Supplementary Table 5). The Illumina Data 
were analysed for log Bayes factors greater than 10 using QuantiSNP software 
(http://www.well.ox.ac.uk/QuantiSNP). Q-PCR was also used to validate some of 
the discovered CNVs and to estimate the false discovery rate (see Supplementary 
Fig. 7 for details). 

Human iPS cell generation. Human foreskin fibroblasts (HFFs; CRL-2429, 
ATCC) and human lung embryonic fibroblasts (IMR90; CCL-186, ATCC) were 
reprogrammed to human iPS cells as previously described’. Briefly, retroviral 
constructs—pMXs-OCT4, pMXs-SOX2, pMXs-KLF4, pMXs-NANOG and 
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pMXs-LIN28—were obtained by cloning the human cDNA encoding each of 
the factors into the pMXs retroviral vector. pMXs constructs were transfected 
separately into the 293-GPG packaging cell line" (10° cells per 100-mm-diameter 
culture dish) to produce retroviral supernatant. Fibroblast lines, seeded over- 
night, were infected twice with different, but equally mixed, combinations of viral 
supernatants (0.5 ml each supernatant, 4 10° cells per 60-mm-diameter dish), 
over the course of 2 days (see Supplementary Table 2 for the different combina- 
tions). The following day, the medium was changed to fibroblast medium. On day 
4, infected cells were collected and reseeded on mitotically inactivated MEFs. The 
next day, the medium was changed to human ES cell medium containing FGF2 as 
described elsewhere**. Medium was replenished every 2 days. At 20-30 days 
post transduction, depending on colony size, colonies with human ES-cell-like 
morphology were picked and expanded for further analysis. For the new 
piggyBac-transposon-generated human iPS cell lines, HFF cells were seeded in 
60-mm-diameter plates at a density of 4 X 10° cells per plate. After 24h culturing, 
cells were trypsinized, and they were then electroporated using a 100-l tip and 
program number 20 in the Neon Transfection System (Invitrogen) with 250 ng 
each transposon construct’, 500 ng PB-rtTA construct* and 500 ng pCyL43 PB 
transposase plasmid’. After 24h, the medium was supplemented with doxycy- 
cline (day0) and was then changed to human ES cell medium at 48h after 
transfection. Cells were fed every 2 days with doxycycline-containing medium 
(1.5 j1g ml ') for 20-30 days. Doxycycline was removed one passage after picking 
human iPS cell clones. Human iPS cell colonies were picked and cultured as 
described above for retrovirus-derived human iPS cells. For sample collection 
and genomic DNA extraction, cells were scraped in collagenase or dispase (1 mg 
ml ') and centrifuged twice at low speed to pellet the cells as small colonies and 
remove the majority of MEFs, which remain as single cells in suspension and are 
aspirated with the medium. 

Pluripotent stem cell differentiation. For embryoid body formation, the cells 
were detached by collagenase IV treatment and plated onto ultra-low attachment 
dishes (Corning) in human ES cell medium without FGF2. The culture medium 
was changed every 3 days. After 10 days, the embryoid bodies were collected for 
further analysis. Teratomas were generated as described elsewhere“. 

For endodermal differentiation, cells were differentiated as described else- 
where”. In brief, 80-90% confluent cells were cultured on a mitotically inactivated 
MEF layer for 24h in RPMI 1640 medium (Gibco) supplemented with GlutaMAX, 
100ng ml! recombinant human activin A (provided by M. Hyvénen) and 10% 
(v/v) WNT3A-conditioned medium (DMEM supplemented with 10% (v/v) 
KnockOut Serum Replacement and GlutaMAX, conditioned for 7 days on L 
Wnt-3A cells (ATCC)). The cells were cultured for another 2 days in RPMI 
1640 with GlutaMAX, 100 ng ml ' activin A and 0.2% (v/v) EBS to the definitive 
endoderm (DE) stage. DE-stage cells were then cultured for 3 days in RPMI 1640 
supplemented with GlutaMAX, 2% (v/v) FBS and 50 ng ml 'KGE(R&D Systems) 
to the primitive gut tube (PG) stage. The cells were cultured for another 3 days with 
DMEM supplemented with GlutaMAX, 1% (v/v) B-27 supplement (Gibco), 2 1M 
all-trans retinoic acid (Sigma), 0.25 uM KAAD-cyclopamine (Toronto Research 
Chemicals) and 50 ng ml! noggin (R&D Systems) to the posterior foregut (PF) 
stage. Finally, the cells were cultured for another 3 days in DMEM supplemented 
with GlutaMAX and 1% (v/v) B-27 supplement to the pancreatic endoderm (PE) 
stage. The medium was changed every day, and RNA samples were collected at the 
end of every stage for Q-PCR and immunocytochemistry. 

For neuronal differentiation, cultured cells were detached with type IV 
collagenase and transferred as small colonies in a 1/1 ratio to ultra-low binding 
six-well plates (Costar) in NSE medium (Euromed medium supplemented with 
sodium pyruvate (Gibco), GlutaMAX, N-2 supplement (Gibco), B-27 supple- 
ment, 25 pg ml! human insulin (Sigma), non-essential amino acids, 0.1 mM 
2-mercaptoethanol and 0.05% (v/v) BSA (Gibco)). After 6 days in suspension 
culture, the spheres were transferred onto plates coated with 1/100-diluted 
growth-factor-reduced Matrigel (BD Biosciences) in a 1/1 ratio of NSE and NB 
medium. NB medium consists of neurobasal medium (Gibco) supplemented with 
GlutaMAX, non-essential amino acids, 2% (v/v) B-27 supplement, 2 1g ml? 
heparin (Sigma), 0.1 mM 2-mercaptoethanol and 0.05% (v/v) BSA. The cells were 
cultured for another 10 days, and the medium was changed every other day. The 
cells were then immunostained for BIII-tubulin and nestin. 
Immunocytochemistry. Samples were washed with PBS and fixed in 4% para- 
formaldehyde (Electron Microscopy Sciences) for 15 min at room temperature. 
After three washes in PBS, cells were permeabilized in 0.2% Triton X-100 in PBS 
for 12 min and were subsequently washed three times with PBS. Samples were 
then blocked with Protein Block for 10 min, washed three times with PBS and 
incubated with primary antibodies overnight at 4°C. The next day, cells were 
washed twice with Tween-20-PBS and twice with PBS. Secondary antibodies— 
Alexa Fluor 594 anti-goat IgG or Alexa Fluor 488 anti-rabbit IgG (both from 
Invitrogen)—were diluted 1/500 in 0.2% Triton X-100 in PBS, and cells were 
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incubated with antibodies for 30min at 4°C. Primary antibodies were anti- 
NANOG (Santa Cruz Biotechnology), anti-OCT4 (Santa Cruz Bio- 
technology), anti-brachyury (Santa Cruz Biotechnology), anti-FOXA2 (Santa 
Cruz Biotechnology), anti-SOX17 (Santa Cruz Biotechnology), anti- TRA-1-60 
(Millipore), anti-BIII-tubulin (R&D Systems), anti-PDX1 (Beta Cell Biology 
Consortium), anti-NKX6.1 (Beta Cell Biology Consortium) and anti-nestin 
(Chemicon) antibodies. 

Three-dimensional FISH. Human iPS cells were cultured on glass slides seeded 
with MEF feeder cells. Samples were fixed in 2% paraformaldehyde in PBS for 
5 min, washed three times with PBS, permeabilized with 0.5% Triton X-100 in 
PBS for 20 min, and washed three more times with PBS. The slides were then 
placed in a solution of 20% glycerol in PBS overnight at 4 °C. Slides were frozen in 
liquid nitrogen, allowed to partly thaw and then placed back into the 20% glycerol 
solution. This process was repeated five times. After the freeze-thaw procedure, 
the slides were washed three times in PBS and then placed in a solution of 0.1 M 
HCl for 5 min. Slides were then washed with 2 SSC and left overnight at 4 °C ina 
solution of 50% formamide in 2X SSC. Before hybridization, the slides were 
denatured in a solution of 70% formamide in 2X SSC at 75°C for 3 min and 
then immediately placed in a separate container containing the same denatura- 
tion solution that had been kept on ice. Control (Bac clone RP11-788E9) and test 
(Bac clone RP11-58E1) probes were obtained from the Centre for Applied 
Genomics (Toronto). Test and control probe region coordinates were chr1: 
146,828,351-147,150,258 and chr1: 104,629,600- 104,808,778, respectively, based 
on the human genome assembly of March 2006 (NCBI36/hg18). The test probe 
was selected based on a cluster of CNVs consisting of mainly deletions within a 


frequently affected region in chromosome 1 (coordinates Chr1: 145,797,568- 
147,958,358). The probes were directly labelled with either spectrum green or 
orange fluorophore-conjugated nucleotides. A hybridization mixture consisting 
of labelled probe and human Cot-1 DNA in a 2/1 ratio in hybridization buffer 
(50% formamide, 10% dextran sulphate, 50 nM sodium phosphate buffer, pH 7.0, in 
2 SSC) was prepared and denatured at 80 °C for 5 min and then allowed to partially 
reanneal at 37 °C for 20 min. This mixture was then applied to the slides that had 
been kept on ice during the previous step and left to hybridize overnight at 37 °C. 
After hybridization, the slides were washed in 50% formamide in 2X SSC three times 
at 42 °C, then once in a solution of 0.5X SSC at 60 °C, and finally in a solution of 2x 
SSC at room temperature. Slides were mounted with VECTASHIELD containing 
DAPI (Vector Laboratories) before fluorescence imaging. Images were collected 
using an IX81 inverted brightfield microscope (Olympus) equipped with a 
Cascade 512 camera (Photometrics) using a X60, 1.32 NA, oil-immersion objective 
and Immersion Oil Type DF (Cargille Labs) imaging medium. Images were collected 
using MetaMorph Premier 7.7 (Molecular Devices) and analysed with ImageJ 
(National Institutes of Health). 
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Somatic coding mutations in human 
induced pluripotent stem cells 


Athurva Gore!, Zhe Li!*, Ho-Lim Fung', Jessica E. Young’, Suneet Agarwal’, Jessica Antosiewicz-Bourget*, Isabel Canto’, 
Alessandra Giorgetti®, Mason A. Israel’, Evangelos Kiskinis®, Je-Hyuk Lee’, Yuin-Han Loh®, Philip D. Manos?, Nuria Montserrat”, 
Athanasia D. Panopoulos®, Sergio Ruiz®, Melissa L. Wilbert”, Junying Yu’, Ewen F. Kirkness’, Juan Carlos Izpisua Belmonte”’®, 
Derrick J. Rossi!°, James A. Thomson‘, Kevin Eggan®, George Q. Daley’, Lawrence S. B. Goldstein? & Kun Zhang! 


Defined transcription factors can induce epigenetic reprogramming of adult mammalian cells into induced pluripotent 
stem cells. Although DNA factors are integrated during some reprogramming methods, it is unknown whether the 
genome remains unchanged at the single nucleotide level. Here we show that 22 human induced pluripotent stem 
(hiPS) cell lines reprogrammed using five different methods each contained an average of five protein-coding point 
mutations in the regions sampled (an estimated six protein-coding point mutations per exome). The majority of these 
mutations were non-synonymous, nonsense or splice variants, and were enriched in genes mutated or having causative 
effects in cancers. At least half of these reprogramming -associated mutations pre-existed in fibroblast progenitors at low 
frequencies, whereas the rest occurred during or after reprogramming. Thus, hiPS cells acquire genetic modifications in 
addition to epigenetic modifications. Extensive genetic screening should become a standard procedure to ensure hiPS 


cell safety before clinical use. 


Human induced pluripotent stem cells have the potential to revolu- 
tionize personalized medicine by allowing immunocompatible stem 
cell therapies to be developed’*. However, questions remain about 
hiPS cell safety. For clinical use, hiPS cell lines must be reprogrammed 
from cultured adult cells, and could carry a mutational load due to 
normal in vivo somatic mutation. Furthermore, many hiPS cell repro- 
gramming methods use oncogenes that may increase the mutation 
rate. Additionally, some hiPS cell lines have been observed to contain 
large-scale genomic rearrangements and abnormal karyotypes after 
reprogramming”. Recent studies also revealed that tumour suppressor 
genes, including those involved in DNA damage response, have an 
inhibitory effect on nuclear reprogramming*”. These findings suggest 
that the process of reprogramming could lead to an elevated muta- 
tional load in hiPS cells. 

To probe this issue, we sequenced the majority of the protein-coding 
exons (exomes) of 22 hiPS cell lines and the nine matched fibroblast 
lines from which they came (Table 1). These lines were reprogrammed 
in seven laboratories using three integrating methods (four-factor retro- 
viral, four-factor lentiviral and three-factor retroviral) and two non- 
integrating methods (episomal vector and messenger RNA delivery 
into fibroblasts). All hiPS cell lines were extensively characterized for 
pluripotency and had normal karyotypes before DNA extraction 
(Supplementary Methods). Protein-coding regions in the genome were 
captured and sequenced from the genomic DNA of hiPS cell lines and 
their matched progenitor fibroblast lines using either padlock 
probes’*” or in-solution DNA or RNA baits'*”’. We searched for single 
base changes, small insertions/deletions and alternative splicing var- 
iants, and identified 12,000-18,000 known and novel variants for each 
cell line that had sufficient coverage and consensus quality (Table 1). 


hiPS cell lines contain a high level of mutational load 


We identified sites that showed the gain of a newallele in each hiPS cell 
line relative to their corresponding matched progenitor fibroblast 
genome. A total of 124 mutations were validated with capillary 
sequencing (Fig. 1, Table 2 and Supplementary Fig. 1), which revealed 
that each mutation was fixed in heterozygous condition in the hiPS cell 
lines. No small insertions/deletions were detected. For three hiPS cell 
lines (CV-hiPS-B, CV-hiPS-F and PGP1-iPS), the donor’s complete 
genome sequence obtained from whole blood is publicly available'*"; 
we used this information to further confirm that all 27 mutations in 
these lines were bona fide somatic mutations. Because 84% of the 
expected exomic variants’® were captured at high depth and quality, 
the predicted load is approximately six coding mutations per hiPS cell 
genome (see Table 1 for details). The majority of mutations were mis- 
sense (83 of 124), nonsense (5 of 124) or splice variants (4 of 124). 
Fifty-three mis-sense mutations were predicted to alter protein func- 
tion’” (Supplementary Table 1). Fifty mutated genes were previously 
found to be mutated in some cancers'*'*. For example, ATM is a well- 
characterized tumour suppressor gene found mutated in one hiPS cell 
line, and NTRK1 and NTRK3 (tyrosine kinase receptors) can cause 
cancers when mutated” and contained damaging mutations in three 
hiPS cell lines (CV-hiPS-F, iPS29e and FiPS4F-shpRB4.5) that were 
reprogrammed in three labs and came from different donors. Two 
kinase genes from the NEK family, which is related to cell division, 
were mutated in two independent hiPS cell lines. In addition to cancer- 
related genes, 14 of the 22 lines contained mutations in genes with 
known roles in human Mendelian disorders*’. Three pairs of hiPS cell 
lines (iPS17a and iPS17b, dH1F-iPS8 and dH1F-iPS9, and CF-RiPS1.4 
and CF-RiPS1.9) shared three, two and one mutation, respectively; 
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Table 1 | Sequencing statistics for mutation discovery 


Cell line Exome capture method Quality-filtered No. of high-quality dbSNP Shared high-quality No. of coding mutations 
sequence (bp) coding variants percentage coding region (bp) observed/projected 

CV-hiPS-F Padlock + SeqCap EZ 9,928,014,640 15,595 98% 6,374,878 14/15 
CV-hiPS-B SeqCap EZ 7,977,894,480 14,876 98% 21,891,518 10/12 
CV fibroblast Padlock + SeqCap EZ 7,586,731,600 15,442 98% _— —_ 
DF-6-9-9 Padlock + SeqCap EZ* 9,289,593,520 14,366 95% 7,806,151 6/7 
DF-19-11 SeqCap EZ 3,212,662,880 13,792 95% 21,342,017 7/9 
iPS4.7 SeqCap EZ 3,132,462,400 14,154 95% 21,729,562 4/5 
Foreskin fibroblast Padlock + SeqCap EZ* 8,430,654,720 14,819 95% — —_ 
PGP1-iPS SeqCap EZ 4,599,556,400 14,105 95% 9,681,915 3/4 
PGP1 fibroblast SureSelect 3,504,437,120 14,781 95% — = 
dH1F-iPS8 SeqCap EZ 3,950,994,160 13,552 96% 6,874,057 8/10 
dH1F-iPS9 SeqCap EZ 3,945,196,800 14,191 95% 21,536,158 3/4 
dH1F fibroblast SeqCap EZ 3,373,535,920 13,838 95% = — 
iPSlla SureSelect 1,836,303,440 13,845 95% 18,557,098 4/5 
iPSI1b SureSelect 3,378,603,200 15,152 95% 17,206,934 7/8 
Hib11 fibroblast SureSelect 5,660,864,960 13,579 95% — = 
iPS17a SureSelect 4,805,756,800 5,039 95% 17,888,773 4/5 
iPS17b SureSelect 7,129,037 ,520 5,400 95% 19,902,076 5/6 
Hib17 fibroblast SureSelect 3,962,506,880 3,365 96% = = 
iPS29A SureSelect 4,112,237,360 3,464 94% 17,328,182 2/3 
iPS29e SureSelect 1,669,916,080 3,800 94% 18,985,791 7/9 
Hib29 fibroblast SureSelect 4,388,388,320 4,445 95% _— = 
dH1cF16-iPS1 SeqCap EZ 4,321,661,440 5,061 95% 19,601,528 2/2 
dH1cF16-iPS4 SeqCap EZ 4,668,085,920 4,958 95% 23,956,732 6/7 
dH1cF16 fibroblast SeqCap EZ 4,178,664,160 4,879 95% —_— —_ 
CF-RiPS1.4 SeqCap EZ 4,733,743,840 1,344 96% 21,272,233 2/3 
CF-RiPS1.9 SeqCap EZ 3,143,591,760 3,674 95% 21,165,013 5/6 
CF fibroblast SeqCap EZ 3,204,874,880 1,855 96% = = 
FiPS3F1 SeqCap EZ 3,397,397,360 3,333 94% 20,723,620 4/5 
FiPS4F7 SeqCap EZ 3,346,801,280 4,584 94% 21,608,258 2/3 
HFFXF fibroblast SeqCap EZ 3,331,494,880 13,040 94% —_— —_ 
FiPS4F2p9 SeqCap EZ 4,725,258,400 18,033 92% 25,188,054 iW 
FiPS4F2p40 SeqCap EZ 4,848,006,000 18,376 92% 25,411,595 1i/hl 
FiPS4F-shpRB4.5 SeqCap EZ 4,911,008,400 19,491 92% 25,240,944 8/8 
IMR90 fibroblast SeqCap EZ 5,019,916,240 18,220 92% — = 


Quality-filtered sequence represents the total amount of sequence data generated that passed the Illumina GA IIx quality filter (bp, base pair). The number of high-quality coding variants is the number of variants 
found with a sequencing depth of at least eight and a consensus quality score of at least 30. The dbSNP percentage represents the percentage of identified variants present in the Single Nucleotide Polymorphism 
Database. The shared coding region is the portion of the genome, in base pairs, that was sequenced at high depth and quality in both the iPS cell line and its progenitor fibroblast. The number of coding mutations 
lists both the number of identified coding mutations and a projection of the total number of identified mutations based on the fraction of Consensus Coding Sequence variants?® (out of ~17,000 expected variants) 


successfully identified in both hiPS cells and fibroblasts. 


* For these cell lines, mutation calling was performed individually using both padlock probe data and hybridization-capture data. Each method found five mutations, four of which were shared, leading to a total of 
six mutations. Padlock probe and hybridization capture have separate strengths (specificity versus unbiased coverage); it seems that these factors directly affect the ability to find separate mutations. 
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Figure 1 | hiPS cells acquired protein-coding somatic mutations. Somatic 
mutations in the gene NTRK3 were found in two independent hiPS cell lines but 
were not present in their fibroblast progenitors. Detailed information for all 
mutations is in the Supplementary Information. 
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these most probably arose in shared common progenitor cells before 
reprogramming. However, most hiPS cell lines derived from the same 
fibroblast line did not share common mutations (Table 2 and 
Supplementary Table 1). 

These data raise the possibility that a significant number of muta- 
tions occur during or shortly after reprogramming and then become 
fixed during colony picking and expansion. An alternative hypothesis 
is that the mutations we found are simply the result of age-accrued 
biopsy heterogeneity or normal somatic mutation during in vitro 
fibroblast cell culture. The skin biopsies were collected from donors 
of ages varying from newborn to 82 years; biopsy heterogeneity there- 
fore does not seem to have a primary role, as the mutational load is not 
correlated (squared linear correlation coefficient, R* = 0.046) with 
donor age (Supplementary Fig. 2). We attempted to grow clonal 
fibroblasts to obtain a control for single-cell mutational load, but a 
direct assessment was not possible owing to technical difficulties in 
mimicking the exact culture conditions (Supplementary Methods). 
Assuming that the skin biopsy is mutation free, we were able to use 
previously published values for the typical mutation rate in culture to 
obtain an expectation of ten times fewer mutations per genome than 
we observed (P< 1.27 X 10 °°; Supplementary Methods), indicating 
that hiPS cell mutational load is higher than normal-culture muta- 
tional load. We define the term ‘reprogramming-associated muta- 
tions’ to describe somatic mutations observed in these hiPS cell lines. 
Reprogramming-associated mutations could pre-exist at low frequencies 
in the fibroblast population, could occur during the reprogramming 
process or could occur after reprogramming. All reprogramming- 
associated mutations have become fixed in the hiPS cell population. 
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Cell line Mutated genes No. of non-silent No. detectable at low frequency 
mutations in fibroblasts (present/tested) 
CF-RiPS1.4 OR52E8, TEAD4 1 NA 
CF-RiPS1.9 OR52E8, FAM171A1, TMED9, TEAD4, RASEF 3 NA 
CV-hiPS-B MMP26, DYNC1H1, VMO1, DSC3, CELSR1, FLT4, UBE2CBP, ARHGEF5, IGF2BP3, DLG3 7 7/8 
CV-hiPS-F IQGAP3, SPEN, TNR, PBLD, OR6Q1, INTS4, GSG1, NTRK3, DNAH3, GOLGA4, FAT2, 12 4/7 
C6orf25, UBR5, SDR16C5 
DF19.11 SPATA21, RGS8, LPPR4, KCNJ8, SETBP1, ZNF471, TMEM40 5 NA 
DF6-9-9 2ZZ3, AKR1C4, NEK5, DAPL1, ITCH, PPP1R2 5 0/5 
dH1CF16-iPS1 IRGQ, TM9SF4 1 NA 
dH1CF16-iPS4 PKP1, MYOG, ABCA3, PTPRM, RANBP3L, CALN1 4 NA 
dH1F-iPS8 CABC1 (ADCK3), Clorf100, OR5AN1, CACNG3, MYRIP, SLC1A3, DSP, KLRG2 6 NA 
dH1F-iPS9 SEMA6C, MYRIP, SLC1A3 3 NA 
FIPS3F1 SORCS3, GLRA3, CARM1, EPB41L1 2 A 
FiPS4F7 GDF3, ZER1 2 A 
iPSlla GTF3C1, SALL1, SLC26A3, ZNF16 3 /1 
iPS11b MARCKSL1, PRDM16, ATM, LRP4, TCF12, SH3PX3 (SNX33), OSBPL3 5 0/1 
iPS17a HK1, ANKRD12, SCN1A, IFNGR1 4 A 
iPS17b HK1, CCKBR, ANKRD12, SCN1A, IFT122 5 1/1 
iPS29A PRICKLE1, RFX6 2 2/2 
iPS29e C14orf174 (SAMD15), NTRK3, VAC14, ASB3, STX7, POLR1C, LINGO2 6 1/4 
iPS4.7 POLE, UBA2, L3MBTL2, C4orf41 2 A 
PGP1-iPS Cl1lorf67, OSBPL8, NEK11 1 1/3 
FIPS4F2 TMEM57, RANBP6, CTSL1,SAV1, KRT25, BCL2L12, LGALS1, TTYH2*, COPA*, ARSB*, MT1B* 7 A 
FiPS4F-shpRB4.5 NTRK1, CD1B, LRCH3, SH3TC1, GPC2, CDK5RAP2, MYH4, TRMU 5 A 
The full details of each mutation are in Supplementary Table 1. 
* Mutation was observed at passage 40 but not at passage 9. FiPS4F2 was sequenced at both passage 9 and passage 40. Seven mutations were present after reprogramming (FiPS4F2P9), and four more became 


fixed after extended culturing (FiPS4F2P40). All seven mutations found after reprogramming were also present after extended culturing. 


Reprogramming-associated mutations arise through 
multiple mechanisms 

To test whether some observed mutations were present in the starting 
fibroblasts at low frequency before reprogramming, we developed a 
new digital quantification assay (DigiQ) to quantify the frequencies of 
32 mutations in six fibroblast lines using ultradeep sequencing 
(Supplementary Figs 3 and 4). We amplified each mutated region 
from the genomic DNA of 100,000 cells with a high-fidelity DNA 
polymerase and sequenced the pooled amplicons with an Illumina 
Genome Analyser at an average coverage of 10°. Although the raw 
sequencing error is roughly 0.1-1% with the lumina sequencing plat- 
form, detection of rare mutations at a lower frequency is possible with 
proper filtering and careful selection of controls”. For each fibroblast 
line, we included the mutation-carrying hiPS cell DNA as the positive 
control and a ‘mutation-free’ DNA sample as the negative control for 
sequencing errors (Supplementary Methods). Comparison of the allelic 
counts at the mutation positions between the fibroblast lines and the 
negative controls allowed us to distinguish rare mutations from 
sequencing errors and estimate the detection limit of the assay. 
Seventeen of the 32 mutations were found in fibroblasts in the range 
of 0.3-1,000 in 10,000, and 15 mutations were not detectable 
(Supplementary Tables 2 and 3). In each fibroblast line with more than 
one detectable rare mutation, the frequencies of the mutations were 
very similar, which suggests that a small subpopulation of each fibro- 
blast line contains all pre-existing hiPS cell mutations and that the rest 
of the cells lacked any of them. 

We extended this analysis by asking whether all of the hiPS cell 
mutations could have pre-existed in the fibroblast populations. For 
the 15 mutations not detected with the DigiQ assay, the detection 
limits can be estimated (Supplementary Methods). At seven of the 
15 sites, the sequencing quality was high enough that rare mutations 
at frequencies of 0.6-5 in 100,000 should be detectable with our assay 
(Supplementary Table 3). Because 30,000-100,000 fibroblast cells 
were used in the reprogramming experiments, we can rule out the 
presence of two mutated genes (NTRK3 and POLRI1C) in more than 
one cell of the starting fibroblast population, and five others were 
present in no more than one or two cells. 

As another test of the hypothesis that all of the mutations pre- 
existed in fibroblasts before reprogramming, we examined the exomes 
of two hiPS cell lines derived from fibroblast line dH1cf16, which was 


clonally derived from the dHIF fibroblast line and passaged the 
minimum amount to generate enough cells for reprogramming. The 
two hiPS cell lines derived from the non-clonal dH1F fibroblast line 
contained eight and, respectively, three new mutations not found in the 
fibroblasts; we observed a very similar independent mutational load in 
the clonal lines (six new mutations in the hiPS cell line dH1cf16-iPS1 
and two new mutations in the hiPS cell line dH1cf16-iPS4). Together, 
these experiments establish that although some of the reprogramming- 
associated mutations were likely to pre-exist in the starting fibroblast 
cultures, the others occurred during reprogramming and subsequent 
culturing. Specific distributions tend to vary across hiPS cell lines 
(Supplementary Table 3). 

Mutations that occur during reprogramming could be due in part toa 
significantly elevated mutation rate during reprogramming. It is also 
possible that selection could have an important role. We tested the 
possibility that an elevated mutation rate might occur because the repro- 
gramming process might be inducing transient repression of p53 (also 
known as TP53), RB1 and other tumour suppressor genes, which are 
known to inhibit reprogramming and are required for normal DNA 
damage responses. Simian virus 40 large-T antigen, which inactivates 
tumour suppressor and DNA damage response genes” (including p53 
and RB1), was expressed during reprogramming of three analysed hiPS 
cell lines (DF6-9-9, DF19-11 and iPS4.7)**. Another hiPS cell line 
(FiPS4F-shpRB4.5) was generated while directly knocking down RB1 
(Supplementary Fig. 5). However, the observed mutational load was 
very similar in these lines in comparison with the others, indicating that 
reprogramming-associated mutations cannot be explained by an ele- 
vated mutation rate caused by p53 or RBI repression. 

We also probed whether additional mutations could become fixed 
during extended passaging by extending our analysis of one hiPS cell line. 
Although most of our hiPS cell lines were sequenced at fairly low passage 
number (less than 20), to measure the effect of post-reprogramming 
culturing directly we also sequenced one hiPS cell line (FiPS4F2) at two 
passages (9 and 40). We discovered that all seven mutations identified 
in the passage-9 line remained fixed in the passage-40 line, but that 
four additional mutations were found to be fixed in the passage-40 cell 
line. 

To test the possibility that selection operates during hiPS cell 
generation, we performed an enrichment analysis to determine whether 
reprogramming-associated mutated genes were more likely to be 
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observed than random somatic mutation in cancer cells. We used the 
COSMIC database as a source of genes commonly mutated in cancer’’. 
We discovered that the reprogramming-associated mutated genes were 
significantly enriched for genes found mutated in cancer (P = 0.0019; 
Supplementary Information), which implies that some mutations were 
selected during reprogramming. 

As an alternative test of the selection hypothesis, we asked whether 
mutations associated with reprogramming could be functional, on the 
basis of the non-synonymous/synonymous (NS/S) ratio. Traditionally, 
the analysis of the NS/S ratio is applied to germline mutations that have 
evolved over a long period of evolutionary time, and is not directly 
applicable to somatic mutations. However, functional mutations are 
known to be positively selected in cancers, allowing us to make a direct 
comparison with mutation characteristics found in cancer genomes. 
Strikingly, the NS/S ratio is very similar between mutations identified 
in three recent cancer genome sequencing projects*’ and the repro- 
gramming-associated mutations we found (2.4/1 and 2.6/1, respectively), 
indicating that a similar degree of selection pressure may be present. 

We also checked whether reprogramming-associated mutations 
could provide a common functional advantage, through a pathway 
enrichment analysis using Gene Ontology terms”*. No statistically 
significant similarity was identified, indicating that mutated genes have 
varied cellular functions. Again, identical results were found when per- 
forming the same analysis on mutations identified during the genome 
sequencing of melanoma, breast cancer and lung cancer samples**”’. 
This lack of enrichment in cancer genomes is generally thought to be 
due to the presence of many passenger mutations in cancer cells, which 
could also be the cause for reprogramming-associated mutations. 
Nonetheless, these analyses suggest that selection of potentially func- 
tional mutations could have a role in amplifying rare-mutation- 
carrying cells and, when coupled with the single-cell bottleneck in 
hiPS cell colony picking, could contribute to the fixation of initially 
low-frequency mutations throughout the entire hiPS cell population. 


Discussion 


Taken together, our results demonstrate that pre-existing and new 
mutations that occur during and after reprogramming all contribute 
to the high mutational load we discovered in hiPS cell lines. Although 
we cannot completely rule out the possibility that reprogramming itself 
is ‘mutagenic’, our data argue that selection during hiPS cell reprogram- 
ming, colony picking and subsequent culturing may be contributing 
factors. A corollary is that if reprogramming efficiency is improved to a 
level such that no colony picking and clonal expansion is necessary, the 
resulting hiPS cells could potentially be free of mutations. 

Despite the power of our experimental approach to identify and char- 
acterize reprogramming-associated mutations accurately, their func- 
tional significance remains to be shown. This issue parallels a general 
problem facing the genomics community: high-throughput sequencing 
technologies have allowed data generation rates to greatly outpace func- 
tional interpretation. Additionally, when considering the biological sig- 
nificance of reprogramming-associated mutations, there are two 
separate functional aspects to consider: whether some of these mutations 
contributed functionally to the reprogramming of cell fate, and whether 
some of these mutations could increase disease risk when hiPS-cell- 
derived cells/tissues are used in the clinic. These two aspects are not 
necessarily connected. Although the functional effects of the 124 muta- 
tions remain to be characterized experimentally, it is nonetheless striking 
that the observed reprogramming-associated mutational load shares 
many similarities with that observed in cancer. Furthermore, the obser- 
vation of mutated genes involved in human Mendelian disorders sug- 
gests that the risk of diseases other than cancer needs to be evaluated for 
hiPS-cell-based therapeutic methods. Future long-term studies must 
focus on functional characterization of reprogramming-associated 
mutations to aid further the creation of clinical safety standards. 

Safe hiPS cells are critical for clinical application. Therefore, just 
as previous findings of large-scale genome rearrangements in hiPS 
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cell lines led to the introduction of karyotyping as a standard post- 
reprogramming protocol, routine genetic screening of hiPS cell lines 
to ensure that no obviously deleterious point mutations are present 
must become a standard procedure. Complete exome or genome 
sequencing of hiPS cell lines might be an efficient way to screen out 
hiPS cell lines that have a high mutational load or have mutations in 
genes implicated in development, disease or tumorigenesis. Further 
rigorous work on mutation rates and distributions during in vitro 
culturing and reprogramming of hiPS cells, and perhaps human 
embryonic stem cells, will be essential to help establish clinical safety 
standards for genomic integrity. 


METHODS SUMMARY 


CV-hiPS-F and CV-hiPS-B were reprogrammed from CV fibroblasts using four- 
factor retroviral vectors. PGP1-iPS cells were reprogrammed by Cellular 
Dynamics using the same four factors in a lentiviral vector from PGP1F fibro- 
blasts’. We obtained dH1F-iPS8, dH1F-iPS9, dH1cF16-iPS1, dH1cF16-iPS4, 
dH1cF16 and dH1F cells from previous cultures” reprogrammed with retroviral 
vectors containing the same factors*'. We obtained DF-6-9-9, DF-19-11, iPS4.7 
and FS cells from previously existing cultures; the reprogramming process and 
characterization of lines has been described previously**. We obtained iPS11a, 
iPS11b, iPS17a, iPS17b, iPS29A, iPS29e, Hibl11, Hib17 and Hib29 cells from 
previous cultures reprogrammed using retroviral vectors encoding three or four 
factors’. FiPS3F1 and FiPS4F7 were reprogrammed from HFFxF fibroblasts 
using similar protocols***°. FiPS4F2 and FiPS4F-shpRB4.5 were reprogrammed 
using the same four-factor protocol from IMR90 fibroblasts. We obtained the 
mRNA-derived lines (CF-RiPS1.4, CF-RiPS1.9 and CF fibroblasts) from previous 
cultures*®. All hiPS cell lines were extensively characterized for pluripotency. 
Fourteen lines were tested for teratoma formation and shown to express all 
embryonic germ layers in vivo. DNA was extracted from each cell type using 
Qiagen’s DNeasy kit. 

Exome capture was performed with either a library of padlock probes, com- 
mercial hybridization-capture DNA baits (NimbleGen SeqCap EZ) or RNA baits 
(Agilent SureSelect), and the resulting libraries were sequenced on an Illumina 
GA IIx sequencer. We rejected putative mutations if they were known poly- 
morphisms or contained any minor allele presence in the fibroblast. All candidate 
mutations were confirmed using capillary Sanger sequencing. 

For digital quantification, mutations were PCR-amplified and sequenced using 
an Illumina GA IIx. These libraries were sequenced to obtain on average 
1,000,000 independent base calls for each location. A binomial test was then used 
to determine whether the observed minor allele frequency could be separated 
from error and to estimate the frequency of each mutation. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


CV fibroblast derivation. Primary fibroblasts were established from a 4-mm 
dermal punch biopsy of a 63-year-old male using a protocol based on 
Takashima’s method’. The biopsy and subsequent reprogramming protocols 
and the informed-consent documents were reviewed and approved by the 
UCSD institutional ESCRO and IRB. Briefly, collagenase type 1A (Sigma) was 
used to dissociate the biopsy and cells were cultured in fibroblast media (DMEM 
containing 15% FBS, penicillin/streptomycin, sodium pyruvate, non-essential 
amino acids and L-glutamine). Fibroblasts were reprogrammed at passage 5. 
DNA was isolated for sequencing from 3,000,000 fibroblasts at passage 9. 
CV-hiPS-B and CV-hiPS-F derivation. For reprogramming, ~100,000 fibro- 
blasts per well were transduced with pCX4 retroviral vectors encoding OCT4 
(POUSFI1), SOX2, KLF4, c-MYC (MYC) and +EGFP. CV-hiPS-B and CV- 
hiPS-F were derived from the + EGFP and —EGFP transductions, respectively. 
Transduced fibroblasts were trypsinized and seeded onto irradiated mouse 
embryonic fibroblasts (MEFs) and cultured in HUES media**. Cultures were 
treated with 2mM valproic acid for the first seven days post-transduction and 
10 nM Y-27632 for the first three weeks (both from EMD Chemicals). After about 
three weeks post-transduction, individual colonies that morphologically 
resembled hES were isolated and expanded. Established hiPS cell lines were 
maintained in HUES media and dissociated cultures for subculturing using 
0.05% trypsin/EDTA. DNA for sequencing was isolated from CV-hiPS-B and 
CV-hiPS-F at passages 13 and 9, respectively. 

CV-hiPS characterization. For PCR analysis with reverse transcription, hiPS 
cells were purified away from MEFs by passage onto Matrigel. Cells were collected 
and total RNA was isolated with the Ambion PaRIS kit following manufacturer's 
protocols. First-strand complementary DNA was generated with Superscript II 
(Invitrogen) following manufacturer’s protocols. cDNA was amplified with primers 
specific for endogenous SOX2, NANOG and OCT4 for 30 cycles. For immunofluor- 
escence experiments, cells were passaged onto Matrigel-coated coverslips and 
samples were processed using standard methods. Antibodies were used at the 
following dilutions: NANOG (Santa-Cruz Biotechnology, 1:200), Tra-1-81 (BD 
Biosciences, 1:500), SOX2 (Chemicon, 1:2,000). Cell Line Genetics performed 
karyotype analysis of CV hiPS cell lines. For embryoid body generation, hiPS cells 
were passaged with dispase and plated in suspension culture in embryoid body 
media (DMEM, 20% EBS, L-glutamine and NEAA) for eight days. On day eight, 
embryoid bodies were plated onto either Matrigel- or polyornithine/laminin- 
coated coverslips and cultured in either embryoid body media (for endoderm/ 
mesoderm) or neural differentiation media (DMEM-F12, glutamax, N2 and B27) 
supplemented with dbcAMP, BDNF and GDNF (for neuroectoderm) for eight 
days. On day nine, cells were fixed and processed for immunofluorescence as 
described above. Cell Line Genetics performed karyotype analysis of CV-hiPS cell 
lines. 

CV-hiPS-B was purified away from MEFs by culturing on Matrigel (BD 
Biosciences) for two passages. CV-hiPS-F was purified by dissociation with 
Accutase (Innovative Cell Technologies), staining with TRA-1-81 antibody (BD 
Biosciences) and purifying 5,000,000 TRA-1-81* cells using a BD Biosciences 
FACSAria II flow cytometer. 
dH1F-iPS8 and dH1F-iPS9 derivation. The dH1F fibroblast line was derived 
from the H1-OGN line previously’. dH1F-iPS8 and dH1F-iPS9 were repro- 
grammed”! with human OCT4, SOX2, KLF4 and c-MYC retroviral vectors from 
dH1F at passage 5. Briefly, 293T cells in 15-cm plates were transfected with 6.25 pg 
of retroviral vector, 0.75 ug of VSVG vector and 5.625 1g of Gag-Pol vector using 
FUGENE 6 reagents. Three days after transfection, supernatants were filtered 
through a 0.45-um cellulose acetate filter, concentrated by centrifugation at 
23,000 r.p.m. for 90 min and stored at — 80 °C until use. Transductions were carried 
out on dH1F fibroblast cells in six-well plates (100,000 cells per well). Viruses were 
added at a multiplicity of infection of five. Three days after infection, cells were split 
into plates pre-seeded with MEFs. The medium was changed to human ES culture 
medium five days after infection. hiPS cell clones stared to emerge about two to 
three weeks later and were picked and expanded in standard human ES cell culture 
medium (DMEM/F12 containing 20% KOSR, 10ngml~* human recombinant 
basic fibroblast growth factor, x1 NEAA, 5.5 mM 2-ME, 50 units ml! penicillin 
and 50,1gml~' streptomycin). During cell collection, MEFs were removed by 
suction pump and collagenase (Gibco) was used to lift the cells. For dH1F, cells 
were cultured in 10% FBS DMEM. Trypsin-EDTA was used to lift the cells from the 
plate for collection. DNA was extracted using a Qiagen DNeasy kit at the following 
passage numbers: 12 (dH1F), 19 (dH1F-iPS8), 17 (dH1F-iPS9). 
hiPS 11a, 11b, 17a, 17b, 29A and 29e derivation. Human fibroblasts were generated 
from 3-mm forearm dermal biopsies following informed consent under an IRB 
approved by Harvard University. The murine leukaemia retroviral vector pMXs 
containing the human cDNAs for KLF4, SOX2 and OCT4” were modified to 
produce higher-titer virus by including the woodchuck post-transcriptional 


responsive element of FUGW (Addgene plasmid 14883) downstream of the 
cDNA. VSV-g pseudotyped viruses were packaged and concentrated by the 
Harvard Gene Therapy Initiative at Harvard Medical School. To produce hiPS 
cells, 30,000 human fibroblasts were transduced at a multiplicity of infection of 
10-15 with viruses containing all three genes in hES medium with 8 pg ml * 
polyprene. Cells were incubated with virus for 24h before medium was changed 
to standard fibroblast medium for 48 h. Cells were subsequently cultured in standard 
hES medium and hiPS cell colonies were manually picked on the basis of mor- 
phology within 2-4 weeks. Derived hiPS cell lines (11a, 11b, 17a, 17b and 29e) have 
been extensively characterized by standard assays including staining for markers of 
pluripotency by immunocytochemistry, cell cycle analysis, three-germ-layer dif- 
ferentiation potential in vitro and in vivo, and karyotype analysis”. All cell cultures 
were maintained at 37 °C in 5% CO . Human fibroblasts were cultured in KO- 
DMEM (Invitrogen), supplemented with 20% Earl’s salts 199 (Gibco) and 10% 
hyclone (Gibco), X 1 GlutaMax, penicillin/streptomycin (Invitrogen) and 100 1M 
2-mercaptoethanol. hiPS cells were maintained on gelatinized tissue culture plastic 
on a monolayer of irradiated CF-1 MEFs (GlobalStem), in hES media**, supple- 
mented with 20 ng ml” ' of bFGF. The medium was changed every 24h and lines 
were passaged by trypsinization (0.5% trypsin EDTA, Invitrogen) or dispase 
(Gibco, Imgml ! in hES media for 30 min at 37°C). hiPS cell lines 11a, 11b, 
17a, 17b, 29A and 29e were purified from MEFs by using dispase, which selectively 
detaches stem cells, and then were washed twice to ensure removal of any con- 
taminating MEFs. Genomic DNA was extracted with a Qiagen DNeasy kit at the 
following passages: 7 (hFib17), 20 (iPS17A), 23 (iPS17B), 7 (hFib11), 24 (hFib11a), 
20, (hFib11b), 8 (hFib29), 21 (hFib29e), 36 (hFib29A). 

HFEXF fibroblast derivation. Primary fibroblasts were established from a fore- 
skin biopsy of a three-year-old individual as detailed in ref. 33. Briefly, a skin 
sample was placed in sterile saline solution, divided into small pieces and allowed 
to be attached to cell culture dishes before the addition of xeno-free human 
foreskin fibroblast growth medium. Fibroblasts generated under xeno-free con- 
ditions (HFFxF) were reprogrammed at passage 3. DNA was isolated for sequen- 
cing from 4,000,000 HFFxF fibroblasts at passage 4 with a Qiagen DNeasy kit. 
FiPS3F1 and FiPS4F7 generation. For reprogramming, about 100,000 fibro- 
blasts per six-well plate were transduced with 1 ml of retroviral supernatants 
encoding FLAG-tagged OCT4, SOX2, KLF4, and c-MYC(T58A) as described in 
ref. 34. High-titer VSV-G-pseudotyped retroviruses expressing a polycistronic 
vector encoding for OCT4, SOX2, KLF4 and GFP (pMXs OSKG) and containing 
5mgml | polybrene were produced as described in ref. 35. Infection was per- 
formed as indicated previously’. Colonies were picked on the basis of morpho- 
logy 25-35 days after the initial infection and plated onto fresh irradiated XF HFF 
(iXF HFF) cells. Xeno-free iPS cell lines FiPS3F1 and FiPS4F7 were maintained by 
mechanical dissociation in XF-hESm, which is composed of KO-DMEM 
(Dulbecco’s modified Eagle’s medium; Invitrogen) supplemented with 15% 
xeno-free KO-SR (Invitrogen), xeno-free KO-SR growth factor cocktail (x1), 
2mM glutamax, 50mM 2-mercaptoethanol, penicillin/streptomycin (0.5, all 
from Invitrogen), non-essential amino acids (Cambrex) and 20 ng ml | bEGF 
(Peprotech). 

FiPS3F1 and FiPS4F7 characterization. Derived hiPS cell lines FiPS3F1 and 
FiPS4F7 have been extensively characterized by staining for markers of pluripo- 
tency by immunofluorescence analyses. The following antibodies were used: 
MAB4360 for Tra-1-60 (1:200), MAB4381 for Tra-1-81 (1:200) and AB5603 
for SOX2 (1:500, all from Chemicon); MC-813-70 for SSEA-4 (1:2) and MC- 
631 for SSEA-3 (1:2, both from the Developmental Studies Hybridoma Bank at 
the University of lowa); C-10 for OCT4 (1:100, Santa Cruz); EB06860 for NANOG 
(1:100, Everest Biotechnology); and Anti-FLAG (Sigma M2). Three-germ-layer 
differentiation potential in vitro was conducted by means of embryoid body 
formation, which was induced from colony fragments mechanically collected. 
For endoderm, embryoid bodies were cultured in KO-DMEM medium supple- 
mented with 10% FBS, 2mM 1-glutamine, 0.1 mM 2-f-mercaptoethanol, non- 
essential amino acids and penicillin/streptomycin. For mesoderm differentiation, 
the same medium described above in the presence of ascorbic acid (0.5 mM) was 
used. For ectoderm induction, embryoid bodies were cultured in N2/B27 medium 
with the stromal cell line PA6 for two weeks. The medium for each condition was 
changed every other day. On day 15, cells were fixed and processed for immuno- 
fluorescence for the following antibodies: Tujl (1:500, Covance), o.-fetoprotein 
(1:400), o-actinin (1:100, Sigma). Teratoma formation assay was performed by 
injecting about 0.5 X 106 XF-iPS cells into the testes of severe combined immu- 
nodeficient beige mice (Charles River Laboratories). Mice were euthanized eight 
weeks after cell injection, and tumours were processed and analysed following 
conventional immunohistochemistry protocols (Masson’s trichromic stain) and 
immunofluorescence staining for Tujl (1:500, Covance), «-fetoprotein (1:400) 
and «-actinin (1:100, Sigma). Expression of retroviral transgenes and endogenous 
pluripotency-associated factors by quantitative PCR with reverse transcription 
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were conducted as described previously”’. hiPS cell lines FiPS3F1 and FiPS4F7 
were purified from iXF HFF by mechanical dissociation and further culturing on 
Matrigel (BD Biosciences) for two more passages. DNA for sequencing was iso- 
lated from passage 9 for both FiPS3F1 and FiPS4F7 with a Qiagen DNeasy kit. 
CF-Fib, CF-RiPS1.4 and CF-RiPS1.9 derivation. CF fibroblasts (CF-Fib) were 
previously obtained from a skin biopsy taken from an adult with cystic fibrosis, 
with proper informed consent*. CF-induced pluripotent stem cell lines were 
derived using modified mRNAs coding reprogramming factors OCT4, SOX2, 
KLF4, c-MYC and LIN28 (OSKML) with molar concentrations in the ratio 
3:1:1:1:1, in an atmosphere with 5% oxygen, as previously described**. Briefly, 
50,000 fibroblasts were plated onto y-irradiated human neonatal fibroblast feeders 
(GlobalStem) seeded at 33,00 cellscm™ 7. For CF-RiPS derivations, the cationic 
lipid delivery system RNAiMAX was used. First, pooled RNA from the five factors 
OSKML (100 ng ml!) was diluted <5 and the reagent (5 ul of RNAiMAX per 
microgram of RNA) was diluted X10 in Opti- MEM basal media (Invitrogen). 
These components were pooled and incubated for 15 min at room temperature 
before being dispensed to culture media. Nutristem medium was replaced daily 4h 
after transfection, and supplemented with 100ngml* bFGF and 200ngml ' 
B18R (eBioscience). CF-RiPS derivation was performed in low oxygen (5%) in a 
NAPCO 8000 WJ incubator (Thermo Scientific). Medium was equilibrated in 5% 
oxygen for approximately 4h before use and cultures were passaged with TrypLE 
Select recombinant protease (Invitrogen) on days five and six. The daily RNA dose 
applied in the RiPSC derivations was 1,200 ng per well (six-well plate format). On 
day 21, RiPS colonies were mechanically picked and transferred to MEF-coated 
24-well plates with standard hESC medium (DMEM/F12 containing 20% KOSR 
(Invitrogen), 10 ng ml | bEGE (Gembio), X1 NEAA (Invitrogen), 0.1 mM b-ME 
(Sigma), 1 mM 1-glutamine (Invitrogen), 50 units ml‘ penicillin and 50 pg ml * 
streptomycin) containing 5mM Y27632 (BioMol). Clones were mechanically 
passaged once more to MEF-coated six-well plates, and then expanded via enzym- 
atic passaging with collagenase IV (Invitrogen). Genomic DNA was extracted with 
a Qiagen DNeasy kit at the following passages: 9 (CF-Fib), 5 (CF-RiPS1.4), 5 (CF- 
RiPS1.9). 

FiPS4F2 and FiPS4F-shpRb4.5 plasmid construction. pMX-Oct4, pMX-SOX2, 
pMX-KLF4, pMX-cMyc and pLVTHM were obtained from Addgene (plasmids 
17217, 17218, 17219, 17220 and 12247, respectively). For the generation of the 
mammalian lentiviral plasmid encoding small hairpin RNAs against pRb-specific 
oligonulceotides (forwards, 5'-CGCGTGTTTCCTCTTCCAAAGTAATTCAA 
GAGATTACTTTGGAAGAGGAAACTTTTTTGGAAAT-3’; reverse, 5’-CGA 
TTTCCAAAAAAGTTTCCTCTTCCAAAGTAATCTCTTGAATTACTTTGGA 
AGAGGAAACA-3’), were annealed, phosphorylated with T4 kinase and ligated 
into Mlul/Clal-linearized pLTVHM plasmid. The design of the small hairpin RNA 
was carried out using the SFOLD software (http://sfold.wadsworth.org/). All con- 
structs generated were subjected to direct sequencing to rule out the presence of 
mutations. 

FiPS4F2 and FiPS4F-shpRb4.5 retroviral and lentiviral production. Moloney- 
based retroviral vectors (pMX-) were co-transfected with packaging plasmids 
(pCMV-gag-pol-PA and pCMV-VSVg) in 293T cells using Lipofectamine 
(Invitrogen). Retroviral supernatants were collected 24h after transfection, and 
passed through a 0.45 mM filter. Second-generation lentiviral vectors (pLVTHM-) 
were co-transfected with packaging plasmids (psPAX2 and pMD2.G, obtained 
from Addgene, 12260 and 12259, respectively) in 293T cells using Lipofectamine 
(Invitrogen). Lentiviral supernatants were collected 36h after transfection. 
FiPS4F2P9, FiPS4F2P40 and FiPS4F-shpRb4.5 derivation. Briefly, for the 
formation of hiPS cells IMR90 fibroblasts were infected with equal proportions 
of retroviruses encoding for OCT4, SOX2, KLF4 and c-MYC plus empty lenti- 
viruses (used to generate the FiPS4F2 line) or lentiviruses encoding small hairpin 
RNA against pRb (used to generate the line FiPS4F-shpRb4.5) by spinfection of 
the cells at 1,850 r.p.m. for 1 h at room temperature in the presence of polybrene 
(4g ml *). After two serial infections, cells were passaged onto fresh MEFs and 
switched to hES cell medium (DMEM/F12 (Invitrogen) supplemented with 20% 
Knockout serum replacement (Invitrogen), 1mM L-glutamine, 0.1mM _ non- 
essential amino acids, 55 mM {-mercaptoethanol and 10 ng ml! bFGF (Joint 
Protein Central)) four days after the first infection. For the derivation of hiPS cell 
lines, colonies were manually picked and maintained on fresh MEF feeder layers 
for five passages before the growth in Matrigel/mTesR1 (Stem Cell Technologies) 
conditions. DNA was extracted after nine passages for FiPS4F2P9 and FiPS4F- 
shpRB4.5 and 40 passages for FiPS4F2P40. 

FiPS4F2 and FiPS4F-shpRb4.5 characterization. Cell pellets were lysed in 
10 mM Tris-HCl (pH 8), 150mM NaCl, 1% Triton X100, 1 mM Na3VO,, 1mM 
PMSF and the Complete protease inhibitor mixture (Roche). Total protein extracts 
(25 ug) were used for SDS-PAGE, transferred to nitrocellulose membranes 
(Amersham Biosciences) and analysed using primary antibodies against OCT4 
(sc-5279, Santa Cruz), SOX2 (AB5603, Chemicom), RBI (554136, Pharmingen) 
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and Tubulin (T5168, Sigma). Horseradish-peroxidase-conjugated secondary anti- 
mouse or rabbit were purchased from Cell Signaling and used at 1:5,000 dilution. 
Tubulin was used as a loading control. Immunoblots were visualized using 
SuperSignal solutions following the manufacturer's instructions (Thermo 
Scientific). Total RNA was isolated using TRIzol Reagent (Invitrogen), and 
cDNA was synthesized using the SuperScript II Reverse Transcriptase kit for 
RT-PCR (Invitrogen). Real-time PCR was performed using the SYBR-Green 
PCR Master mix (Applied Biosystems). Values of gene expression were normalized 
using GAPDH expression and are shown as fold change relative to the value of the 
sample control. All the samples were done in triplicate. Primer sequences are 
available upon request. The hiPS cell lines were cultured in the presence of 20 ng 
ml! colcemid for 45 min. The cells were trypsinized, washed with PBS and resus- 
pended in a hypotonic solution by drop-wise addition while vortexing at low speed. 
After 10 min of incubation at 37 °C, cells were fixed by drop-wise addition of 1 ml of 
cold Carnoy’s fixative. Stained metaphases were analysed with CYTOVISION 
software (Applied Imaging). Teratoma analyses were performed as described in 
ref. 34. 

Preparation of padlock probes. The design and preparation of padlock probes 
was based on published methods'*'™®. Libraries of long oligonucleotides (140 
nucleotides) that cover different exonic regions were synthesized from program- 
mable microarrays (Agilent Technologies). The libraries were amplified by per- 
forming 48-96 PCR reactions (100,11 each) with 0.02nM template 
oligonucleotides, 200nM Ap1V4IU primer (G*T*AGACTGGAAGAGCAC 
TGTU), 200nM Ap2V4 primer (/5Phos/TAGCCTCATGCGTATCCGAT), 
X0.2 SybrGreen I and 50 pl Econo Taq PLUS master mix (Lucigen), at 94°C 
for 2 min, and then 17 cycles at 94 °C for 30s, 58°C for 30s, 72 °C for 30s and 
72°C for 3min. The amplicons were then purified by ethanol precipitation. 
Libraries were then digested with 40 units of Lambda Exonuclease (5 U ul, 
NEB) in X1 Lambda Exonuclease buffer (NEB) at 37°C for 2h, followed by 
purification with four Qiagen Qiaquick PCR purification columns for every 48 
wells of PCR products. Approximately 8 1g of the purified PCR amplicons were 
digested with ten units of DpnII (50 U pl ') and X1 DpnII buffer at 37 °C for 2h, 
followed by the addition of four units of USER enzyme (1 U pl’, NEB) at 37°C 
for 4h. The DNA was digested with 6% PAGE and purified into single-stranded, 
102-nucleotide probes. 

Multiplex capture of exonic regions. Padlock probes (600 nM total concentra- 
tion), 250ng of genomic DNA, 1nM suppressor oligonucleotides and X1 
Ampligase buffer (Epicentre) were mixed in a 15-l reaction and denatured at 
95°C for 10 min, then gradually cooled at the rate of 0.1°Cs | to 60°C. The 
mixture was hybridized at 60 °C for 24h. To circularize the captured targets, the 
reactions were then incubated at 60 °C for another 24h after adding 2 ul of gap- 
filling mix (two units of AmpliTaq Stoffel (Life Technology), four units of 
Ampligase (Epicentre), and 500 pmol total dNTP). After circularization, 2 ul of 
exonuclease mix containing 10 U ul”! exonuclease I (USB) and 100 U ul! exo- 
nuclease III (USB) was added to digest the linear DNA, and the reactions were 
incubated at 37 °C for 2h and then inactivated at 94 °C for 5 min. 
Amplification of capture circles. The 15-pl circularization products were placed 
in 100-11 PCR reactions with 200 nM of each primer (NH2-CAGATGTTATCGA 
GGTCCGAC, NH2-GGAACGATGAGCCTCCAAG, X0.2 SybrGreen I and <1 
Phusion High-Fidelity PCR Master Mix (NEB) at 98 °C for 1 min, and then 16 
cycles at 98°C for 10s, 58°C for 20s, 72°C for 20s and 72°C for 3 min. The 
amplicons of the expected size range (200bp) were purified using Qiagen 
Qiaquick columns. 

Shotgun sequencing library construction. Purified PCR products with the four 
probe sets on the same template DNA were pooled in equal molar ratio. The PCR 
products were transferred into Covaris microTubes with snap caps for Covaris 
AFA shearing using a 10% duty cycle, an intensity setting of 5 and 200 cycles per 
burst. The sheared DNA was concentrated to 85 1] using a vacufuge, and was then 
prepared for sequencing library construction using NEBNext DNA Sample Prep 
Master Mix Set 1 (NEB). The fragmented DNA was end-repaired at room tem- 
perature for 30 min in 100-1 reaction consisting of <1 NEBNext End Repair 
Reaction Buffer and 5 ul of NEBNext End Repair Enzyme Mix. The DNA was 
then purified with Qiagen Qiaquick columns. Approximately 500 ng to 1 tig of the 
end-repaired blunt DNA was incubated in a thermal cycler for 30 min at 37 °C 
along with <1 NEBNext dA-Tailing Reaction Buffer and 3 ul of Klenow frag- 
ment. The DNA was again purified using Qiagen Qiaquick columns. The purified 
DNA was size-selected (125-150 nucleotides) using E-Gel SizeSelect 2% 
(Invitrogen) and concentrated to 36 pil using a vacufuge (Eppendorf). The dA- 
tailed DNA was then ligated at room temperature for 15 min with X1 Quick 
Ligation Reaction Buffer, 1.6 nM Illumina ligation adaptors and 2 ll of Quick T4 
DNA ligase. Ligation products were purified using Qiagen Qiaquick columns and 
amplified by PCR in 100-pl reactions with a 15-1 template, 200 nM Illumina 
PCR primers, 0.2 SybrGreen I and X1 Phusion High-Fidelity PCR Master Mix 
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(NEB) at 98 °C for 1 min, and then eight cycles at 98 °C for 10s, 65 °C for 20s, 
72°C for 15s and 72°C for 3min. The PCR amplicons were purified with 
Qiaquick PCR purification columns, size-selected (200-275 nucleotides) using 
6% PAGE and sequenced on an Illumina Genome Analyser IIx. 
Hybridization capture with DNA or RNA baits. Liquid exome capture was 
performed using the commercial Roche NimbleGen SeqCap EZ Exome kit or 
the commercial Agilent SureSelect kit (Table 1). Experiments were performed 
following the manufacturers’ protocols. Briefly, genomic DNA was sheared and 
ligated to Solexa sequencing adaptors. DNA was then hybridized with the SeqCap 
EZ Exome library or SureSelect RNA baits to capture exomic regions. Exome 
regions were captured with streptavidin beads and then PCR-amplified with 
Illumina sequencing adaptors. The resulting libraries were sequenced on an 
Illumina Genome Analyser IIx. 
Consensus sequence generation and variant calling. Reads obtained from the 
Illumina Genome Analyser were post-processed and quality filtered using 
GERALD. The end of each read was then mapped to the padlock-probe capturing 
arm sequences using Bowtie; any reads that successfully mapped were discarded 
to prevent bias from capturing arms. Reads were then mapped to the whole 
genome using Bowtie or BWA. Any read that could not be mapped uniquely 
was discarded to reduce false positives due to sequence homology. The 5’ and 3’ 
ends of reads were then trimmed to reduce the effect of sequencing errors, which 
tend to occur near the beginnings and ends of reads on the Illumina platform. (No 
trimming was performed when GATK was used for variant calling.) To reduce 
errors introduced by pre-sequencing amplification, mapped reads that started 
and ended at identical locations were then removed using SAMtools or Picard to 
account for these clonal reads. SAMtools or GATK was then used to generate a 
consensus sequence for each sample by combining the results of each read that 
mapped to each exomic location. A minimum read depth of eight and consensus 
quality of 30 was required at every examined location. The consensus sequences 
were then compared to look for candidate novel mutations in hiPS cells. Variants 
that occurred at locations present in the dbSNP database (version 130) were 
removed from consideration to reduce the false-positive rate, as a novel mutation 
in the hiPS cell line is very unlikely to have been previously characterized in other 
cell lines and was most probably just not observed in the fibroblast line owing to 
stochastic sequencing bias. Because sequencing depth was relatively low in a small 
fraction of exomic regions, allelic imbalance can also lead to false positives, as sites 
in the fibroblast genome could, for example, be heterozygous but be sequenced as 
seven copies of the major allele and one copy of the minor allele and called as 
homozygous. To prevent these false positives, sites in which the fibroblast genome 
showed even a very small presence of minor allele were removed from consid- 
eration as candidate sites for novel mutations (as these sites are most probably 
truly heterozygous in both lines). Several locations were identified in which the 
hiPS cell sample consensus sequence showed a heterozygous call but the fibro- 
blast sample consensus sequence showed a homozygous call; these were identified 
as candidate mutations, as it is expected that during mutational processes, the 
hiPS cell sample would most probably gain an additional allele. These candidate 
mutations were then validated by capillary sequencing as below. 
Sanger validation of candidate mutations. Genomic DNA (6 ng) was amplified 
in a 50-ul PCR reaction with 100nM specifically designed primers near the 
mutation site and 25 ul Taq X2 master mix (NEB) at 94°C for 2 min, followed 
by 35 cycles at 94 °C for 30s, 57 °C for 30s and 72 °C for 30s, and final extension 
at 72 °C for 3 min. The PCR products were then purified with Qiagen Qiaquick 
columns, and 10 ng of purified DNA was pre-mixed with 8 pmol of the sequen- 
cing primer for capillary Sanger sequencing by Genewiz. 
Clonal fibroblast experiments. In an attempt to determine the mutational load 
present in single fibroblasts, we performed a reprogramming-like clonal colony 
purification strategy on fibroblasts. CV fibroblasts were thawed at passage 14 and 
cultured in fibroblast media (DMEM containing 15% FBS, penicillin/streptomycin, 
sodium pyruvate, non-essential amino acids and L-glutamine). A confluent 6-cm 
plate was trypsinized and cells were plated in three 96-well dishes, in the presence 
(two plates) or absence (one plate) of MEF feeder cells, at limiting dilutions. 
Another 96-well plate was plated as a reference plate. Using Poisson calculations, 
cells were diluted and plated such that it was extremely unlikely (<1%) for one well 
to contain more than one cell (leading to an expectation of eight wells per plate with 
one cell). These wells were cultured and progressively passaged from the 96-well 
dish to a 6-cm plate (96-well, 48-well, 24-well, 12-well, 6-well, 6-cm). For cells 
growing on MEFs, all passages from a 12-well dish to a 6-cm dish were done 
without MEFs to minimize contamination with mouse cells in the sequencing 
analysis. Only three MEF-free wells and nine MEF-containing wells successfully 
grew; using Poisson calculations, 24 wells should have successfully grown. 

All fibroblasts grown from single cells showed heavy signs of stress. Cells grew 
very slowly (with passaging needed approximately every one to two weeks). MEF- 
free cells had a flattened morphology, whereas MEF-plated cells maintained a 


normal, spindle-shaped morphology. Cells tended to senesce very soon after 
plating; only a few cells grew successfully. Seven clonal lines were sequenced 
(three grown without MEFs and four grown with MEFs). Six of the lines con- 
tained a very high number of putative mutation candidates (~100), and no 
mutations were found in one line grown on MEFs. We randomly selected 21 of 
the 600 mutation candidates for Sanger validation, and found that approximately 
50% were true positives. This leads to a projection of ~50 protein-coding muta- 
tions in six clonal fibroblast lines, which is tenfold more than what was observed 
in hiPS cells and not consistent with the observations on the other clonal fibro- 
blast line, which was completely mutation free. We proposed that the mutations 
in the six clonal fibroblast lines were due to the stress associated with expanding 
single fibroblast cells. Because fibroblast growing conditions are very different 
from those found in reprogramming, we cannot estimate the background somatic 
mutation rate in such an experiment. We therefore instead used published esti- 
mates of fibroblast mutation rate to estimate clonal fibroblast mutational load 
(see below). 

Digital quantification of mutations. Thirty-two pairs of DigiQ-PCR primers 
were designed such that the forward or reverse primers are roughly 25 base pairs 
away from the 5’ end of each mutation site. This ensured that the mutations of 
interest were sequenced in the part of the read length that had the highest accuracy. 
Primers also contained an annealing region for Illumina Solexa sequencing 
primers at the 5’ ends. Each primer corresponding to a different mutation was 
amplified with a high-fidelity polymerase in three samples: the mutated hiPS cell 
line, the progenitor fibroblast line and a clean control. To sample DNA from 
100,000 cells, 600 ng of DNA was used for each mutated hiPS cell line and fibro- 
blast line. In cases where a separate clonal hiPS cell line not containing the muta- 
tion in question was available, this line was used as a clean control, as the chance of 
this line acquiring the same mutation during clonal expansion is extremely low 
(~10-° for one mutation). In other cases, a ‘low-input’ sample using 300 pg of 
DNA (~50 cells) was used, as rare mutations are unlikely to be present in such a 
small quantity of DNA. Ifany mutated DNA was sampled, it would be immediately 
obvious in the sequencing results and the experiment could be repeated. First- 
round PCR amplification was performed with 600 ng (~100,000 cells) of DNA, 
500 nM of each DigiQ-PCR primer and X 1 iProof High-Fidelity Master Mix (Bio- 
Rad) at 98°C for 30s, followed by ten cycles at 98 °C for 10s, 59°C for 20s and 
72°C for 15s, 18 cycles at 98 °C for 10s and 72 °C for 20s, and final extension at 
72°C for 3min. The PCR amplicons were purified using Qiaquick columns 
(Qiagen). Roughly 100 ng of the first-round PCR product was used as a template 
for second-round PCR amplification, together with <1 Phusion High-Fidelity 
PCR Master Mix (NEB) and 200 nM of each Illumina PCR primer, at 98 °C for 
30 s, followed by ten cycles at 98 °C for 10 s and 64 °C for 30 s, and final extension at 
72 °C for 30 s. The amplicons were purified again with Qiaquick columns (Qiagen) 
and size-selected (roughly 150-200 nucleotides) using an E-Gel SizeSelect 2% 
system (Invitrogen). PCR reactions were performed with the iProof High- 
Fidelity Master Mix (Bio-Rad) and Phusion High-Fidelity PCR Master Mix 
(NEB) to minimize amplification errors. All size-selected products were pooled 
together at equal ratio; these libraries were then mixed with the Illumina PhiX 
control library in a roughly equal ratio to balance the fluorescent signals at all four 
bases and improve the base-calling accuracy, and sequenced using an Illumina GA 
IIx. Each pair of libraries from the fibroblasts and negative controls was sequenced 
in two non-adjacent lanes of a same flow cell. Extreme care was taken in sample 
handling to ensure no cross-contamination from the positive control libraries to 
the other libraries. Alleles identified at each mutation position by the sequencer 
were counted and evaluated. The specific sample choices for each mutation (and 
raw allele counts) are listed in Supplementary Table 2 (for details, see 
Supplementary Fig. 3 and Supplementary Table 3). To verify the robustness of 
the DigiQ assay, the assay was repeated on CV fibroblasts. The obtained read 
proportions were extremely similar (Supplementary Fig. 4). 

Statistical analysis—probability of mutations occurring naturally. We evalu- 
ated the likelihood that the mutations found were generated during fibroblast 
culturing and reprogramming (assuming a clean starting population of fibro- 
blasts) at the normal estimated somatic mutation rate of between 10 ° and 10” 
non-synonymous coding mutations per gene per cell division, which corresponds 
toarate of 6.7 X 10° '° (using the average human coding-region size of 1,500 base 
pairs per gene*'). Assuming that mutations are independent events that occur 
uniformly across the genome, the number of expected mutations during fibro- 
blast culturing and reprogramming can be estimated using a Poisson distribution 
with expected value 1 = 6.7 X 10 !°ns, where n is the number of cell divisions 
and s is the observed sequence. Although accurate records of the number of cell 
divisions experienced by each line during expansion and reprogramming are not 
available, we estimated that 30-35 doublings had occurred for six lines with well- 
documented culture histories. In these lines, a total of 206,227,380 base pairs were 
pairwise-sequenced (at a depth of at least eight and quality of at least 30). This 
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leads to a Poisson distribution with 4 = 4.13-4.81 for the expected number of 
mutations. In this case, we observed 54 coding mutations, leading to a P value of 
1.29 X 10 “°-2.72 x 10° *”. If this calculation is extrapolated to all 22 lines, we 
expect / = 8.7-10.1 coding mutations; we observed 91, leading to a P value of 
4.29 X 10° °?-1.27 X 10 °°. We can therefore say that these mutations did not 
occur by chance with more than 99% confidence for all 22 lines. 

Statistical analysis—digital quantification. To quantify the frequency of each 
mutation in the fibroblast samples, a one-tailed binomial distribution test was 
used. Reads were quality-filtered; only base calls with a Phred-like quality score of 
30 or greater were considered. We denote by p the probability of obtaining a 
sequencing read containing the minor allele. The fibroblast sample was compared 
with either the clean low-input sample or a clean clonal hiPS cell line. Because the 
two hiPS cell lines are clonally independent, they will not share any mutations. 
Therefore, for example, FS-low can be used as a negative control for FS and CV- 
hiPS-B can be used as a negative control for CV-hiPS-F. Any minor allele 
obtained from the clonal hiPS cell or low-input fibroblast sample will be purely 
due to sequencing error. We denote by HO the event that the minor allele fre- 
quency in the fibroblast sample was less than or equal to the minor allele fre- 
quency in the other clonal/low-input sample, and denote by H1 the event that the 
minor allele frequency in the fibroblast sample was greater. If HO is found to be 
true, the mutation cannot be detected in the fibroblast, as any presence of the 
minor allele cannot be distinguished from sequencing error. If H1 is found to be 
true, the presence of the minor allele is detectable and can be quantified. We 
denote by n the total number of reads that called the mutated position. A critical 
value of a = 0.01 was chosen (99% confidence). Because the number of reads for 
each sample was very high, both np and n(1 — p) were greater than five, meaning 
that the minor allele presence could be approximated with a normal distribution. 
We can therefore set a criterion for rejection of the null hypothesis of Z = (x — 1)/ 
s > 2.33, where x is the minor allele count, ju is the mean of the minor allele counts 
of the fibroblast and low-input/clonal samples, and s is the standard deviation of 
the minor allele counts of the fibroblast and low-input/clonal samples. For a 
binomial-distribution approximation, n is the number of reads in the fibroblast 
sample, p is the minor allele frequency if the fibroblast and low-input/clonal data 
are merged, jt = np, ands = np(1 — p). If the value of Z is greater than 2.33, we are 
capable of distinguishing the observed fraction of minor alleles in the fibroblast 
sample from that observed in the clonal/low-input sample. These results are 
presented in Supplementary Table 3. 

We can also construct a 99% confidence interval using the normal approxi- 
mation for the binomial distribution. Although we observed a value for the minor 
allele in each fibroblast sample, due to sequencing error, this value may over- 
estimate or underestimate the true minor allele frequency. We can counteract this 
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error using a normal distribution. The confidence-interval values are derived 
from the normal probability density function and represent the boundaries that 
we are 99% sure the true minor allele frequency lies within: lower bound, 
min((—2.57s + x)/n, 0); upper bound, min((2.57s + x)/n, 0). These estimates 
for the minor allele fraction in fibroblasts are shown in Supplementary Table 3. 
An example of calculation is shown in Supplementary Note. 

Statistical analysis—NS/S mutation ratio. To determine whether selection 
pressure could have a role in reprogramming-associated mutations, we compared 
the mutational load associated with reprogramming with that associated with 
tumorigenesis. The NS/S ratio found in several previously conducted pairwise 
cancer sequencing analyses” *’ was found to be 2.4:1. The load found here out of 
124 identified mutations is 2.6:1, meaning that hiPS cell lines carry a very similar 
mutational pattern to cancer lines. 

Statistical analysis—pathway and COSMIC gene enrichment. To check for 
enrichment of reprogramming-associated mutated genes in cancer-related genes, 
the fraction of genes mutated in hiPS cells found mutated in the COSMIC* 
database was identified as 50/124. As 4,471 of the 16,017 genes well targeted by 
our exome sequencing pipeline are considered to be commonly mutated in cancer, 
a 7° test with one degree of freedom can be used to test for equivalency of distri- 
bution. The obtained 7” value is 9.67, indicating that the fraction of mutated hiPS 
cell genes in the COSMIC set is statistically significantly greater than the normally 
obtained number with a P value of 0.001873. This indicates that hiPS cell mutations 
are enriched in COSMIC set genes at approximately 1.5-fold the normal level, of 
28%, with >99% confidence. To check for commonly mutated pathways, repro- 
gramming-associated mutated genes and mutated genes identified in three cancer 
sequencing papers~*”°”’ were analysed using DAVID. No statistically significant 
pathway Gene Ontology terms were identified; the lowest Benjamini P value found 
was 0.6, which is well above the cut-off value, of 0.01, required for 99% confidence. 
Therefore, no common pathways seem to be mutated in hiPS cells. 
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Hotspots of aberrant epigenomic 
reprogramming in human induced 
pluripotent stem cells 
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Induced pluripotent stem cells (iPSCs) offer immense potential for regenerative medicine and studies of disease and 
development. Somatic cell reprogramming involves epigenomic reconfiguration, conferring iPSCs with characteristics 
similar to embryonic stem (ES) cells. However, it remains unknown how complete the reestablishment of ES-cell-like 
DNA methylation patterns is throughout the genome. Here we report the first whole-genome profiles of DNA 
methylation at single-base resolution in five human iPSC lines, along with methylomes of ES cells, somatic cells, and 
differentiated iPSCs and ES cells. iPSCs show significant reprogramming variability, including somatic memory and 
aberrant reprogramming of DNA methylation. iPSCs share megabase-scale differentially methylated regions proximal to 
centromeres and telomeres that display incomplete reprogramming of non-CG methylation, and differences in CG 
methylation and histone modifications. Lastly, differentiation of iPSCs into trophoblast cells revealed that errors in 
reprogramming CG methylation are transmitted at a high frequency, providing an iPSC reprogramming signature 


that is maintained after differentiation. 


Generation of iPSCs from somatic cells offers tremendous potential for 
therapeutics, the study of disease states, and elucidation of develop- 
mental processes’*. iPSC production techniques introduce active 
genes that are necessary for pluripotency, or their derivative RNA or 
protein products, into a somatic cell to induce pluripotent cellular 
properties that closely resemble those of ES cells* *. Indeed, iPSCs have 
been used to produce viable and fertile adult mice, demonstrating their 
pluripotent potential to form all adult somatic and germline cell 
types*”. 

The reprogramming process by which a somatic cell acquires pluri- 
potent potential is not a genetic transformation, but an epigenomic 
one. A recent study reported minimal differences in chromatin struc- 
ture and gene expression between human ES cells and iPSCs, indi- 
cating that ES cells and iPSCs are nearly identical cell types'®. On the 
other hand, there are recent reports indicating epigenomic differences 
between ES cells and iPSCs'*’* and alterations in the differentiation 
potential of iPSCs compared to ES cells'*"*"”. Together, these findings 
indicate that fundamental differences between ES cells and iPSCs exist, 
prompting the question of how complete and variable the reestablish- 
ment of ES-cell-like DNA methylation patterns are throughout the 
entire genome. 

Presumably, optimal reprogramming of somatic cells to a pluripo- 
tent state requires complete reversion of the somatic epigenome into 
an ES-cell-like state, but until now a comprehensive survey of the 
changes in such epigenetic marks in a variety of independent iPSC 
lines has not been reported. Accordingly, we have performed whole- 
genome profiling of the DNA methylomes of multiple human ES cell, 
iPSC and somatic progenitor lines, encompassing reprogramming 


performed in different laboratories, using different iPSC-inducing 
technologies and cells derived from distinct germ layers. We show that 
although on a global scale ES cell and iPSC methylomes are very 
similar, every iPSC line shows significant reprogramming variability 
compared to both ES cells and other iPSCs, including both somatic 
‘memory’ and iPSC-specific differential DNA methylation. Further, all 
iPSC lines share numerous non-randomly distributed megabase-scale 
regions that are aberrantly methylated in the non-CG context, asso- 
ciated with alterations in CG methylation, histone modifications and 
gene expression. Lastly, we show that differentially methylated regions 
in iPSCs are transmitted to differentiated cells at a high frequency. 


Globally similar ES cell and iPSC methylomes 


To assess the degree to which a somatic cell DNA methylome is repro- 
grammed into an ES-cell-like state by induction of a pluripotent state, 
we generated whole-genome, single-base resolution DNA methylomes 
of a range of human cell types using the shotgun bisulphite-sequencing 
method, MethylC-Seq’®. Our central focus was a high-efficiency, feeder- 
free reprogramming system”, in which female adipose-derived stem 
cells (ADS) were reprogrammed into a pluripotent state by retroviral 
transformation with the OCT4, SOX2, KLF4 and MYC genes (ADS- 
iPSCs), satisfying the criteria for pluripotency in human cells”. 
Additionally, we analysed the DNA methylome of adipocytes derived 
from the ADS cells (ADS-adipose) through adipogenic differentiation 
conditions. Further, to explore the variation between independent iPSC 
lines potentially due to stochastic reprogramming events, progenitor 
somatic cell type, reprogramming technique and laboratory-specific 
effects, we generated full DNA methylomes for four additional iPSC 
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lines that were isolated in an independent laboratory: an iPSC line 
generated by lentiviral integration of the OCT4, SOX2, NANOG and 
LIN28A genes into IMR90 lung fibroblasts (IMR90-iPSCs)°, and three 
iPSC lines generated by reprogramming of foreskin fibroblasts (FF) by 
non-integrating episomal vectors (FF-iPSC 6.9, FF-iPSC 19.7, FF-iPSC 
19.11), as described previously’. We also sequenced the DNA methy- 
lome of the somatic FF progenitor line. Lastly, to study the effects of 
cellular differentiation on the DNA methylomes of ES cells and iPSCs, 
we differentiated cells of each to trophoblast lineage cells by growth in 
the presence of bone morphogenic protein 4 (BMP4)’'. High-sequence 
coverage of the 11 new base-resolution DNA methylomes allowed 
interrogation of 75.7-94.5% of the genomic cytosines (Fig. la and 
Supplementary Table 1). 

The genome-wide frequency of DNA methylation at both CG and 
non-CG (mCH, where H = A, C or T) sites indicated that iPSCs 
resemble ES cells and are distinct from somatic cells. All ES cell and 
iPSC lines were methylated at CG dinucleotides at a higher frequency 
compared to the somatic cell lines (Fig. 1b), consistent with the global 
partially methylated state previously observed in the IMR90 fibroblast 
genome”. Similarly, whereas somatic cells contained negligible levels 
of cytosine methylation in the non-CG context, all pluripotent cells 
harboured significant mCH at a similar frequency (Fig. 1c), account- 
ing for 20-30% of detected DNA methylation events in the genome. 
As observed in ES cells'*, all iPSC genomes showed enrichment for 
mCH in genes (Fig. 1d). On a genome scale the DNA methylomes of 
ES cells and iPSCs are similar to one another and highly distinct from 
the primary somatic cell lines, including the adult stem cell ADS line, 
and this relationship agrees with clustering of cell types based on 
transcriptional activity (Fig. le and Supplementary Fig. la, b). 
Analysis of DNA methylation patterns at enhancers, transcription- 
factor-binding sites and pluripotency-related genes confirmed the 
previously reported methylation patterns'* (Supplementary Figs 2-6). 
Taken together, these data indicate that, on the genome scale and at 
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Figure 1 | Global trends of human iPSC and ES cell DNA methylomes. 

a, Per cent of all cytosines on each strand of the human genome assayed for each 
sample. b, c, The per cent of all sequencing base calls that were methylated (C, 
resistant to bisulphite conversion) at covered C bases in the CG (b) and CH 
contexts (c) (where H = A, C, or T) throughout the genome, minus the 
bisulphite non-conversion frequency. d, AnnoJ data browser representation of 
the restoration of non-CG methylation in all iPSC and ES cell lines. 

e, Dendrogram of the analysed cell lines based on Pearson correlation of mCG 
or mCH levels in 1-kb windows throughout the genome. 
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these crucial genomic regions, iPSC and ES cell DNA methylomes 
closely resemble one another. 

We discovered previously that 40% of the genome of IMR90 fibro- 
blasts was in a partially methylated state, termed partially methylated 
domains (PMDs)'*. The DNA methylomes of the primary somatic cell 
lines we have profiled here also contain PMDs in a similar proportion 
of the genome to IMR90 cells (Fig. 2a). As observed previously in 
IMR90 cells, the transcript abundance associated with genes located 
within PMDs was lower than the average for all other genes (Fig. 2b). 
Notably, these PMDs were transformed to a fully methylated state in 
the CG context by induction of a pluripotent state (Fig. 2a and 
Supplementary Fig. 7). Lastly, the reprogramming process was able 
to reverse the transcriptional repression associated with the PMD 
state (Fig. 2b). 


mCG somatic memory and aberrant reprogramming 


Although global patterns of DNA methylation in the CG context 
appeared very similar between ES cells and iPSCs (Figs 1 and 2), a 
comprehensive analysis of CG DNA methylation between all ES cell 
and iPSC lines identified 1,175 differentially methylated regions (CG- 
DMRs) that were differentially methylated in at least one iPSC or ES 
cell line (1% false discovery rate (FDR); Fig. 3a and Supplementary 
Table 2) and in total comprised 1.68 Mb ranging from 1-11 kb in 
length. Importantly, identification of CG-DMRs between the H1 
and H9 ES cells with the same criteria (1% FDR) provided no results 
(see Supplementary Methods for details). Whereas mCG patterns 
within each category of cells (ES cell, iPSC, somatic) were generally 
consistent and distinct from the cells in each other category, indi- 
vidual cell lines showed some variability. 

DNA methylation at CG islands proximal to gene promoters and 
transcriptional start sites is inhibitory to transcriptional activity”. To 
address whether highly methylated CG islands in differentiated cells 
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Figure 2 | Partially methylated domains become highly methylated on 
induction of pluripotency. a, Total length of PMDs identified in each cell line 
and overlap of PMDs identified in the four somatic cell types. b, mRNA-Seq 
RPKM (reads per kilobase of exon per million reads) values for all RefSeq genes 
outside PMDs, and all RefSeq genes within genomic regions defined as PMDs. 
For ADS-iPSC and H1 the ADS PMD genomic regions were used as PMDs. P 
value is from two-tailed Wilcoxon test between ADS PMDs and ADS-iPSC 
PMDs. 
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Figure 3 | CG-DMRs identified between pluripotent cells. a, Complete 
linkage hierarchical clustering of mCG density within CG-DMRs identified 
between all ES cell and iPSC DNA methylomes. Each CG-DMR was profiled 
over 20 equally sized bins. b, The CG-DMRs for each iPSC line with respect to 
H1 and H9 ES cells were categorized as having methylation patterns like the 
progenitor somatic cell line (memory) or iPSC-specific ((DMR). c, Number of 
iPSC hypomethylated and hypermethyated CG-DMRs aberrant in the 
indicated number of iPSC lines. d, Number of all CG-DMRs coincident with 
indicated genomic and genic features. CGI, CG island; TES, transcriptional end 
site; TSS, transcriptional start site. 


can be demethylated during iPSC reprogramming, we analysed CG- 
DMRs between the ES cells and somatic cells (1% FDR, twofold enrich- 
ment) that overlapped with CG islands. Of 3,507 CG-DMRs coincident 
with CG islands (CGI-DMRs), 1,904 and 374 were hypermethylated in 
ES cells and somatic cells, respectively. Of the 374 CGI-DMRs hyper- 
methylated in somatic cells, 94% were hypomethylated in the iPSCs and 
were similar to ES cells (Supplementary Fig. 8). Of the 1,904 CGI- 
DMRs hypermethylated in ES cells, 83% were hypermethylated, similar 
to ES cells, in the iPSCs (Supplementary Fig. 9). Together, these results 
indicate that CG islands in iPSCs are predominantly reprogrammed to 
an ES-cell-like state and, in particular, hypermethylated CG islands are 
not especially resistant to reprogramming. 

CG-DMRs identified between iPSCs and ES cells may be categorized 
as either a failure to reprogram the progenitor somatic cell methylation 
patterns (somatic ‘memory’) or iPSC-specific DMRs (iDMRs) that are 
not observed in the progenitor somatic cells and ES cells. A recent study 
reported the retention of somatic cell DNA methylation patterns in 
early-passage (passage 4) mouse iPSCs that was sufficient to distin- 
guish between iPSC lines derived from different progenitor cell types, 
and which was subsequently attenuated after further passages (10-16 
in total)’*. However, the iPSCs analysed here included relatively late- 
passage iPSC lines (15-65 passages; Supplementary Table 1), indi- 
cating that we are able to discriminate somatic DNA methylation 
patterns in iPSCs that are resistant to resetting to an ES-cell-like state. 
Comparison of iPSC lines to their respective progenitors revealed that 
44-49% of CG-DMRs were aberrant with respect to ES cells 
(Pvalue = 0.05) and reflected memory of the progenitor methylation 
state (Fig. 3b and Supplementary Fig. 10). Accordingly, 51-56% of the 
iPSC CG-DMRs could be classified as iDMRs, reflecting a methylation 
state dissimilar to the respective progenitor somatic cell and both ES 
cell lines (Fig. 3b and Supplementary Fig. 10). 

Inspection of the concordance of methylation states in the five iPSC 
lines showed that 69% of the CG-DMRs were aberrant with respect to 
the ES cells in at least two iPSC lines, with 16% being confirmed in all five 
iPSC lines (Fig. 3c and Supplementary Table 3). The majority of CG- 
DMRs (80%) occurred at CG islands, and to a lesser extent near or within 
genes (62%), with 29% and 19% located within 2 kb of transcriptional 
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start and end sites, respectively (Fig. 3d). Analysis of biological processes 
attributed to genes proximal to CG-DMRs in each line or common to all 
iPSC lines did not identify any enrichment of specific processes, indi- 
cating that disruption of the normal regulation of these genes could 
affect many aspects of cellular function. Closer inspection of the CG- 
DMRs confirmed in all five iPSC lines revealed that the vast majority of 
them (119 of 130, or 92%) were hypomethylated in the iPSC lines, 
indicating that the general deficiency in resetting DNA methylation 
patterns during reprogramming is insufficient methylation. Notably, 
the remaining 11 CG-DMRs hypermethylated in all iPSC lines were 
iDMRs, as they are not differentially methylated in the progenitor cells 
compared to the ES cells. In addition, they were associated with tran- 
scriptional repression and the absence of the heterochromatic 
H3K27me3 histone modification, compared to H1 ES cells (Fig. 4a, b). 

The genome sequences at the CG-DMRs present in all iPSC lines were 
analysed to identify motifs that could be associated with the altered DNA 
methylation states. Binding sites for two human transcription factors 
were identified in sequences conserved over the DMRs, corresponding 
to the reprogramming factor KLF4 and the chromatin-remodelling 
factor FOXL1 (Supplementary Fig. 11). Given that KLF4 has previously 
been found to bind to the promoter of FAM19A5 in H1 ES cells at 
precisely the same genomic position as one of the 11 hypermethylated 
iDMRs shared between all iPSC lines”’, it is tempting to speculate that 
development of the conserved aberrant methylation states in the iPSC 
lines may be related to altered expression of the endogenous and/or 
introduced copy of KLF4 during the reprogramming process. 

By differentiation of both H1 and FF-iPSC 19.11 cells into tropho- 
blast lineage cells with BMP4, we were able to determine the frequency 
at which CG-DMRs in iPSCs were transmitted through differenti- 
ation. We identified 140 hypomethylated (Fig. 4c) and 70 hyper- 
methylated (Fig. 4d) CG-DMRs present in both FF-iPSC 19.11 cells 
and FF-iPSC 19.11-BMP4 trophoblasts with respect to H1 and H9 ES 
cells, and H1-BMP4 trophoblasts. A high proportion of the CG- 
DMBs in FF-iPSC 19.11 cells relative to both ES cell lines were trans- 
mitted through the differentiation process, with 88% and 46% of 
hypermethylated and hypomethylated CG-DMRs, respectively, still 
present in FF-iPSC 19.11-BMP4 trophoblasts but not in H1-BMP4 
trophoblasts (Fig. 4e). These transmitted CG-DMRs were comprised 
of both somatic memory (Fig. 4e and Supplementary Fig. 12) and 
iDMR (Fig. 4e and Supplementary Fig. 13) classes. Notably, 9 of 11 
hypermethylated and 57 of 119 hypomethylated CG-DMRs present in 
all iPSC lines were transmitted to the FF-iPSC 19.11-BMP4 tropho- 
blast cells. 

The 1,175 CG-DMRs identified between iPSCs and ES cells and the 
iPSC conserved CG-DMRs were profiled and confirmed in two previ- 
ously reported ES cell DNA methylomes, HSF1 (ref. 23) and H9- 
Laurent (ref. 24) (Supplementary Fig. 14). Hierarchical clustering of 
the 1,175 CG-DMRs indicated that HSF1 and H9-Laurent ES cells 
are similar to H1 and H9. Lastly, we find that all of the iPSC hyper- 
methylated CG-DMRs and 75% of the iPSC hypomethylated CG- 
DMRs are confirmed with respect to the two additional ES cell lines 
(P value < 0.05, as for H1 and H9). 

Several conclusions can be made from this catalogue of CG-DMRs. 
First, reprogramming a somatic cell to a pluripotent state generates 
hundreds of aberrantly methylated loci, predominantly at CG islands 
and associated with genes. Second, whereas insufficient reprogram- 
ming manifested as a memory of the progenitor somatic cell methyla- 
tion state is common, a high incidence of iDMRs unlike both the 
progenitor somatic cell and ES cells indicates that aberrant methyla- 
tion patterns dissimilar to both the start and endpoints of the repro- 
gramming process are frequently generated. Third, although there is 
variability in the loci that are differentially methylated between iPSC 
lines, a high proportion of CG-DMRs are found in multiple independ- 
ent iPSC lines, indicating that these regions have a strong propensity 
to be insufficiently or aberrantly reprogrammed. Fourth, a core set of 
CG-DMRs was present in every iPSC line, representing hotspots of 
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Figure 4 | Characterization of CG-DMRs in iPSCs. a, Normalized mCG 
levels (lower y-axis) and normalized H3K27me3 ChIP-Seq read density (upper 
y-axis) over CG-DMRs hypermethylated in all iPSC lines and flanking genomic 
regions. b, Data browser representation of mRNA, DNA methylation and 
H3K27me3 density for a CG-DMR identified in all iPSC lines. c, Complete 
linkage hierarchical clustering of mCG density within the CG-DMRs 
hypomethylated in both FF-iPSC 19.11 and FF-iPSC 19.11-BMP4 relative to 
H1, H9 and H1-BMP4 cell lines. Each CG-DMR was profiled over 20 equally 


failed epigenomic reprogramming common to iPSCs. Fifth, both 
memory CG-DMRs and iDMRs are transmitted through differenti- 
ation of the iPSCs at a high frequency, indicating that the disrupted 
DNA methylation states are not simply a transient aberration during 
the pluripotent state. The identification of hundreds of CG-DMRs 
that cannot be erased by passaging and are frequently transmitted 
through cellular differentiation has immediate consequences for the 
derivation and use of iPSCs. 


Megabase-scale regions of aberrant non-CG methylation 


Although non-CG DNA methylation levels and distribution were 
very similar between ES cells and iPSCs on a whole-genome and 
chromosomal scale (Fig. 1), a systematic comparison of non-CG 
methylation levels between the H1 and the ADS-iPSC lines through- 
out the autosomes revealed the presence of 29 large, non-CG differ- 
entially methylated regions (FDR = 1%; Supplementary Table 4). 
These non-CG ‘mega’-DMRs tended to be very large, with half greater 
than 1 Mb in length, the longest ~4.8 Mb, and in total all 29 made up 
32.4 Mb (Fig. 5a, inset). The majority of non-CG mega-DMRs were 
hypomethylated in the mCH context in the ADS-iPSC line (22 of 29, 
total length = 29.1 Mb; Supplementary Fig. 15a, b). The H1 hypo- 
methylated non-CG mega-DMRs contained 36 genes enriched for 
biological processes related to epidermal cell differentiation (54% of 
36 genes; Pvalue = 1.5 10 *), and that predominantly were not 
expressed in H1 cells but were transcribed at a low level in ADS-iPSCs 


sized bins. d, Same as c for hypermethylated CG-DMRs. e, FF-iPSC 19.11 CG- 
DMR transmission through differentiation to trophoblast cells. CG-DMRs 
were categorized by methylation state relative to the ES cells (hyper, 
hypermethylated; hypo, hypomethylated), similarity to somatic progenitor 
methylation (memory: like progenitor; iDMR: unlike progenitor), and whether 
the CG-DMR was present in FF-iPSC 19.11 differentiated into trophoblast cells 
with BMP4 (transmitted) or not (not transmitted). 


(Supplementary Table 5). Focusing subsequent analysis on the 22 
non-CG mega-DMRs hypomethylated in the ADS-iPSC line com- 
pared to the H1 line, we discovered that non-CG mega-DMR locali- 
zation was strongly biased towards close proximity to centromeres 
and telomeres (Fig. 5a; Poisson Pvalue = 1X 10 eh indicating 
that somatic cell reprogramming may be susceptible to DNA methyla- 
tion abnormalities in these chromosomal regions. We did not find 
evidence that the retroviral insertions used to introduce the pluripo- 
tency factors in ADS-iPSCs was associated with the altered reprogram- 
ming of DNA methylation (Supplementary Fig. 16 and Supplementary 
Table 6). 

Profiling non-CG DNA methylation levels throughout the 22 ADS- 
iPSC hypomethylated mega-DMRs for each ES cell and iPSC line, we 
found that depletion of non-CG methylation was a common feature 
of the independent iPSC lines (Fig. 5b, Supplementary Figs 1b, 17 and 
Supplementary Table 4). We proposed that the localized failure to 
restore non-CG methylation in these large regions could be mechani- 
stically linked to the presence of particular covalent histone modifica- 
tions that impart a regional chromatin conformation that is refractive 
to remethylation at CH sites during reprogramming. Indeed, we iden- 
tified significant regional enrichment of trimethylation of histone H3 
lysine 9 (H3K9me3) in two iPSC lines” that was spatially concordant 
with the non-CG mega-DMRs, and absent in H1 ES cells (Fig. 5c). The 
IMR90 genome also showed enrichment of H3K9me3 highly spatially 
correlated with the non-CG mega-DMRs. Additionally, we found that 
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Figure 5 | Failure to restore megabase-scale regions of non-CG methylation 
is a hallmark of iPSC reprogramming. a, Chromosome ideograms and length 
distribution (inset) of the 22 ADS-iPSC non-CG mega-DMRs. Blue circles and 
lines indicate location of individual DMRs. Red ellipses indicate the location of 
centromeres. b, Normalized mCH levels over all non-CG mega-DMRs and 
flanking genomic regions. c, Lower y-axis as in b for the cell lines indicated. 
Upper y-axis shows normalized H3K9me3 ChIP-Seq read density throughout 
the non-CG mega-DMRs and flanking genomic regions. Dashed blue arrows 
indicate the inverse relationship between mCH and H3K9me3. d, Plot shows 
normalized mCG levels over the non-CG mega-DMRs and flanking genomic 
regions. Inset is a data browser representation of DNA methylation where 


the non-CG mega-DMRs tend to be partially methylated in the CG 
context in non-pluripotent cells (99.5% of non-CG mega-DMR bases 
are partially methylated in ADS cells; Fig. 5d). Taken together, these data 
indicate that specific large regions of somatic cell genomes proximal to 
centromeres and telomeres that are in the partially methylated mCG 
state, and that bear the heterochromatin modification H3K9me3, may 
often be resistant to complete reprogramming of non-CG methylation 
to the embryonic state, remaining in a somatic configuration after 
induction of pluripotency (exemplified for one DMR in Fig. 5e). 

To determine if the non-CG mega-DMRs affected disruption of 
transcriptional activity, we compared the transcript abundance 
between ADS-iPSCs and H1 ES cells of genes located within these 
regions (Fig. 5f). Of the 50 RefSeq genes within the non-CG mega- 
DMRs, 33 showed =2-fold lower transcript abundance in ADS-iPSCs 
compared to H1 ES cells (Supplementary Table 7). This indicates that 
non-CG mega-DMRs are associated with transcriptional disruption 
in the iPSCs (Fig. 5g). Notably, 10 of the 11 iDMRs that were con- 
sistently hypermethylated in every iPSC line (Fig. 4a, b) were located 
within the non-CG mega-DMRs (P = 8.5 X 10 °°), but this was not 
true of any of the common hypomethylated CG-DMRs. Further, 9 of 
these 10 consistently hypermethylated iDMRs located in non-CG 
mega-DMRs were transmitted to the trophoblast cells derived from 
the FF-iPSC 19.11 line. Lastly, 64% of genes with lower transcript 
abundance in ADS-iPSCs in non-CG mega-DMRs also showed dense 
CG hypermethylation at the transcriptional start site (Fig. 5f, red 
circles), a subset of which were consistently hypermethylated at the 
transcriptional start site in all iPSC lines analysed and associated with 
aberrant loss of H3K27me3 (Fig. 5f, blue circles, Fig. 4b) providing 
potential molecular markers for determination of complete repro- 
gramming in iPSC lines. Several of these suppressed genes showing 
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vertical bar height indicates mC level at the 5’ of a non-CG mega-DMR and 
PMD. e, Normalized mCH levels over a non-CG mega-DMR on chromosome 
22 and flanking regions. Top panel shows gene models and ADS-iPSC mCH. 
f, Comparison of transcript abundance between H1 and ADS-iPSC. Each dot 
represents a RefSeq gene within the 22 non-CG mega-DMRs. Red dots indicate 
genes that have a CG-DMR within 2 kb of the transcriptional start site. Blue 
dots indicate genes that have a CG-DMR within 2 kb of the transcriptional start 
site, are hypermethylated in all iPSC lines and are associated with loss of 
H3K27me3. Red dashed lines represent twofold difference. g, The number of 
genes with a given transcript abundance ratio between H1 and ADS-iPSCs for 
all RefSeq genes within the non-CG mega-DMRs. 


transcriptional start site CG hypermethylation encode proteins that 
may be pertinent to neural processes: TMEM132D*°, FAM19A5”, 
TCERGIL* and FZD10. Notably, TCERGIL and FAM19A5 were 
reported to be consistently expressed significantly higher in ES cells 
compared to iPSCs” (J.A.T., personal communication). 


Concluding remarks 


Through generation of the first unbiased, whole-genome, single-base- 
resolution DNA methylomes for a variety of human iPSCs and ES 
cells we have gained several new insights into the epigenomic repro- 
gramming process. Reprogramming induces a remarkable reconfi- 
guration of the DNA methylation patterns throughout the somatic 
cell genome, returning PMDs to a fully methylated state, reinstating 
non-CG methylation, and reprogramming most unmethylated and 
methylated CG islands to an ES-cell-like state. Overall, this process 
generates an iPSC methylome that, in general, is very similar to that of 
ES cells. 

On closer inspection we identified numerous differences in DNA 
methylation between ES cells and iPSCs. In terms of mCG, reprogram- 
ming generated hundreds of differentially methylated regions, most 
associated with CG islands and genes, and seeming to represent both 
memory of the somatic cell DNA methylation patterns as well as iPSC- 
specific DNA methylation patterns. Notably, many of the CG-DMRs 
were shared between independent iPSC lines, indicating that these loci 
are inherently susceptible to aberrant methylation in the reprogram- 
ming process. Further, the presence of unique CG-DMRs in each iPSC 
line indicates that in addition to the aforementioned susceptible 
regions, there may be a stochastic element to reprogramming that 
results in interclone variability. Lastly, both somatic memory and 
iDMRs can be transmitted at high frequency through differentiation. 
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We also identified megabase-scale genomic regions that were 
repeatedly resistant to reprogramming of non-CG methylation, and 
were associated with altered H3K9me3 and transcriptional activity, 
constituting phenotypic differences at the transcriptional level that 
could have downstream consequences for iPSC or derived somatic cell 
function. The close proximity of the non-CG mega-DMRs to centro- 
meres and telomeres indicates that there could be distinct molecular 
properties of these chromosomal regions—for example particular 
histone variants—which impede the reprogramming process. Together, 
the non-CG mega-DMRs, common CG-DMRs in all iPSC lines, trans- 
mitted CG-DMRs and differentially expressed genes are potentially 
useful as diagnostic markers for incomplete iPSC reprogramming, 
characterization of the efficacy of different reprogramming techniques, 
and potential propagation of altered methylation states into derivative 
differentiated cells. From these first comprehensive whole-genome, 
base-resolution methylome maps it seems clear that iPSCs are fun- 
damentally distinct from ES cells, insofar as they manifest common, 
quantifiable epigenomic differences. Continued study of a wide variety 
of ES cells is needed to understand the full range of epigenomic vari- 
ability, and to potentially identify factors that enable complete repro- 
gramming to occur. 


METHODS SUMMARY 

Biological materials and sequencing libraries. Strand-specific mRNA-Seq 
libraries were produced as described previously'*. MethylC-Seq libraries were 
generated by ligation of methylated sequencing adapters to fragmented genomic 
DNA followed by purification, sodium bisulphite conversion and 4-8 cycles of 
polymerase chain reaction (PCR) amplification as described previously'* with 
minor modifications (see Supplementary Materials). ChIP-Seq libraries were pre- 
pared following Illumina protocols with minor modifications (see Supplementary 
Materials). Sequencing was performed using the Illumina Genome Analyser IIx and 
HiSeq2000 instruments as per the manufacturer’s instructions. 

Read processing and alignment. MethylC-Seq sequencing data was processed 
using the Illumina analysis pipeline and FastQ format reads were aligned to the 
human reference genome (hg18) using the Bowtie algorithm”? as described previ- 
ously’* with minor modifications (see Supplementary Materials). mRNA-Seq 
reads were uniquely aligned to the human reference (hg18) and quantified using 
the TopHat*' and Cufflinks* algorithms. Base calling and mapping of Chip-Seq 
reads was performed using the Illumina analysis pipeline. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cell culture. ADS cells were obtained from Invitrogen (catalogue no. R7788110) 
and cultured under recommended conditions. ADS cells were grown in 10-cm? 
dishes (5,000 cells cm” 7). For making iPSCs, ADS cells (3,000 cm *) were plated in 
six-well plates. The cells were infected with the combination of human reprogram- 
ming retroviruses (MYC, KLF4, OCT4, or SOX2 in pMXs; Addgene) that had been 
produced in 293T cells co-transfected with gag/pol and VSV-Gas described earlier. 
On day 5, cells were passed onto 6-cm dishes without MEFs. Cells were cultured in 
DMEM/F12 plus 20% knockout serum replacement (KSR) medium supplemented 
with fB-mercaptoethanol (0.1%), non-essential amino acids (NEAA) (1X), 
Glutamax (1%), and 10 ng ml’ FGF2. Medium was changed every day. On days 
18-28, individual colonies were picked and cultured feeder-free in defined 
mTeSR1 medium on plates coated with Matrigel (BD Biosciences). The profiled 
ADS-iPSC clone was assayed for pluripotency by analysis of the transcript abund- 
ance of pluripotency markers, and in vitro and in vivo (teratoma) differentiation 
into three germ layers, as described previously’. For differentiation from ADS cells 
to mature adipocytes in vitro, ADS cells (10,000 cm ~*) were plated on 10-cm? 
dishes with growth media. Differentiation was induced for 14 days using medium 
consisting of DMEM/F12, 10% KSR, and an adipogenic cocktail (0.5 mM IBMX, 
0.251M dexamethasone, 1 1gml~! insulin, 0.2mM indomethacin and 11M 
pioglitazone). For collecting mature adipocytes, the cells were detached with trypsin, 
then neutralized. After centrifuging detached cells, floated fat cells were transferred 
into new tubes. H9 cells were passage 42 including several passages in mTeSR1. 
IMR90-iPSCs were derived by lentiviral integration as reported previously’, and 
were passage 65, with 33 passages in mTeSR1. FF-iPSC lines were derived using non- 
integrating episomal vectors as described previously’. FF-iPSC 19.7 (DF19-9-7) and 
FF-iPSC 19.11 (DF19-9-11) cells were subclones isolated from a single repro- 
grammed iPSC line (DF19-9), and were cultured independently for at least 20 
passages. Before cell harvest aliquots of cells were assayed for OCT4 expression by 
flow cytometry as described previously****. Cells were also submitted to the WiCell 
Cytogenetics Laboratory to confirm normal karyotype. For BMP4 differentiation, 
H1 or FF-iPSC 19.11 cells were grown in 10-cm’ dishes (approximately 1 X 10’ cells 
per dish) in feeder-free conditions on Matrigel using mTeSR1 media containing 
50 ng ml ' BMP4 for 5 days (RND systems). 

MethylC-Seq library generation. Five micrograms of genomic DNA was 
extracted from frozen cell pellets using the DNeasy Mini Kit (Qiagen) and spiked 
with 25 ng unmethylated Lambda cl857 Sam7 DNA (Promega). The DNA was 
fragmented with a Covaris S2 (Covaris) to 75-175 bp or 100-400 bp for single-read 
or paired-read libraries, respectively, followed by end repair and addition of a 3’ A 
base. Cytosine-methylated adapters provided by Illumina were ligated to the soni- 
cated DNA as per the manufacturer’s instructions for genomic DNA library con- 
struction. For single-read libraries, adaptor-ligated DNA was isolated by two 
rounds of purification with AMPure XP beads (Beckman Coulter Genomics). 
For paired-read libraries, adaptor-ligated DNA of 275-375 bp (150-250 bp insert) 
was isolated by 2% agarose gel electrophoresis. Adaptor-ligated DNA (=450 ng) 
was subjected to sodium bisulphite conversion using the MethylCode kit (Life 
Technologies) as per the manufacturer’s instructions. The bisulphite-converted, 
adaptor-ligated DNA molecules were enriched by 4-8 cycles of PCR with the 
following reaction composition: 2.5 U of uracil-insensitive PfuTurboC, Hotstart 
DNA polymerase (Stratagene), 5,1 10x PfuTurbo reaction buffer, 31 1M dNTPs, 
1 pl Primer 1, 1 pil Primer 2 (50 ul final). The thermocycling parameters were: 95 °C 
for 2 min, 98 °C for 30, then 4-8 cycles of 98 °C for 15s, 60 °C for 30s and 72 °C 
for 4min, ending with one 72°C for 10 min step. The reaction products were 
purified using AMPure XP beads. Up to two separate PCR reactions were per- 
formed on subsets of the adaptor-ligated, bisulphite-converted DNA, yielding up 
to two independent libraries from the same biological sample. Final sequence 
coverage was obtained by sequencing all libraries for a sample separately, thus 
reducing the incidence of ‘clonal’ reads that share the same alignment position and 
probably originate from the same template molecule in each PCR. The sodium 
bisulphite non-conversion rate was calculated as the percentage of cytosines 
sequenced at cytosine reference positions in the Lambda genome. 

Directional RNA-Seq library generation. Total RNA was isolated from cell 
pellets treated with RNAlater using the RNA mini kit (Qiagen) and treated with 
DNasel (Qiagen) for 30 min at room temperature (22 °C). After ethanol precipi- 
tation, biotinylated LNA oligonucleotide ribosomal RNA (rRNA) probes com- 
plementary to the 5S, 5.8S, 12S, 18S and 28S rRNAs were used to deplete the 
rRNA from 5 pg of total RNA by RiboMinus (Life Technologies) as per the 
manufacturer’s instructions. Purified RNA (50ng) was fragmented by metal 
hydrolysis in 1X fragmentation buffer (Life Technologies) for 15 min at 70°C, 
stopping the reaction by addition of 2,1 fragmentation stop solution (Life 
Technologies). Fragmented RNA was used to generate strand-specific RNA- 
Seq libraries as per the Directional mRNA-Seq Library Preparation Protocol 
(lumina). 


Chromatin immunoprecipitation and ChIP-Seq library generartion. Chromatin 
immunoprecipitation (ChIP) and Illumina sequencing for H3K9me2 and 
H3K27me3 was performed as described previously”. 

Mapping retroviral insertion sites. MMLV retroviral insertion sites in ADS- 
iPSC genomic DNA were identified by an adaptor ligation-mediated method for 
genome-wide mapping of insertions, as described previously*’, except with the 
following modifications. Genomic DNA was fragmented by sonication with a 
Covaris S2, followed by ligation of modified 5’ or 3’ long terminal repeat (LTR)- 
specific Illumina adapters: 5'-LTR (5'-3’): CAAGCAGAAGACGGCATACGAG 
ATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTTCAGTGCAG 
CTGTTCCATCTGTTCTTGGCCC; 3’-LTR (5'-3’): CAAGCAGAAGACGG 
CATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTT 
CAGTGGCCAGTCCTCCGATTGACTGAGTCGG. A single mapping library was 
made for each of the 5’ and 3’ LTRs, and each library was sequenced on the 
Illumina Genome Analyser IIx. Each valid read contained the barcode sequence 
‘TCAGTG’ prepended to the 5’ of the genomic DNA read sequence. Retroviral 
insertion sites were identified by localized enrichment of greater than 300 reads 
within a 2-kb window, in both the 5’ LTR and 3’ LTR mapping libraries, and 
located on opposite genome strands between the two libraries. Cloning and 
Sanger sequencing of library molecules from the 3’ LTR mapping library con- 
firmed genomic DNA retroviral insertion sites for a representative fraction of the 
17 insertion sites identified by high-throughput sequencing. 

High-throughput sequencing. Single-read MethylC-Seq and RNA-Seq libraries 
were sequenced for up to 85 cycles using the [lumina Genome Analyser IIx. 
Paired-read MethylC-Seq libraries were sequenced for up to 75 cycles for each 
read using the Illumina HiSeq2000. Image analysis and base calling were per- 
formed with the standard Illumina pipeline, performing automated matrix and 
phasing calculations on a control library that was sequenced in a single lane of 
each flowcell. 

Processing and alignment of MethylC-Seq data to identify methylated cyto- 
sines. All sequence alignments were performed against the NCBI36/hg18 human 
reference. Single-read MethylC-Seq sequences were processed and aligned as 
described previously'*, except an additional filter was added to remove any 
mapped reads in which a read-C base was aligned to a reference-T base. 
Paired-read MethylC-Seq data was mapped and processed as described previ- 
ously’® with the following modifications to accommodate the paired-read data- 
type. Both reads in a pair were trimmed of any low-quality sequence at their 3’ 
ends and mapped to the reference genome with Bowtie v. 0.12.5°° in paired-read 
mode, using the following parameters: -e 90 -1 20 -n 0 -k 10 -o 4 -I 0 -X 550 
-pairtries 100 -nomaqround -solexal.3-quals. Mapped reads in a read pair that 
overlapped were trimmed from their respective 3’ ends until the reads no longer 
overlapped, leaving a 1-bp gap. 

Mapped reads were filtered as follows: any read with more than three mis- 
matches was trimmed from the 3’ end to contain three mismatches, any read pair 
that contained a cytosine mapped to a reference sequence thymine was removed, 
and any read pairs that had more than three cytosines in the non-CG context 
within a single read was removed (possible non-conversion in bisulphite reac- 
tion). Read pairs were then collapsed to remove clonal reads potentially produced 
in the PCR amplification from the same template molecule, based on a common 
start position of read 1. The total uniquely mapped, non-clonal read number for 
each library, average coverage and total sequence yield are detailed in Supplemen- 
tary Table 1. 

For all MethylC-Seq data sets, methylated cytosines were identified from the 
mapped and processed read data as described previously'*. The bisulphite con- 
version rates for all samples were over 99% (Supplementary Table 1). Correction 
of any DNA methylation sites incorrectly categorized as non-CG owing to SNPs 
in the sample versus reference genomes was performed as described previously”. 

For the previously published HSF1** and H9-Laurent™* data sets, the GEO 
sequence read data were mapped using the MethylC-Seq pipeline (H9-Laurent) 
and BS Seeker (HSF1)** (settings: -e = 55, -m 3), and post-processing and methyl- 
cytosine identification was performed using MethylC-Seq pipeline as described 
earlier. 

Genome annotation. Genomic regions and CG islands were defined based on 
NCBI build 36/hg18 coordinates downloaded from the UCSC website. Promoters 
were arbitrarily defined as transcriptional start site +500 bp or 2,000 bp for each 
Ref Seq transcript (as indicated in the text). According to the UCSC annotation 
many Ref Seq transcripts can be associated with a given gene, and they can have 
the same or alternative transcriptional start site. Gene bodies are defined as the 
transcribed regions, from the start to the end of transcription sites for each Ref 
Seq. 

mC and histone profiles. In Fig. 3a each CG-DMR was divided into 20 equally 
sized bins. The average methylation for all cytosines in the CG context within a 
bin in one sample was determined and normalized by the bin size. Lastly, the 
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whole data set was divided by its 70th percentile, and values higher then 1 were 
forced to 1. This was performed to produce a meaningful mapping between values 
and colours in the heatmap key, and to avoid extreme values masking the methy- 
lation levels of other CG-DMRs. CG-DMRs were then reorganized based on their 
similarity by means of complete linkage hierarchical clustering, using the heatmap.2 
R function. 

In Fig. 4a each of the 11 CG-DMRs consistently hypermethylated in the 5 iPSC 
lines was profiled for both mCG and the H3K27me3 histone mark throughout the 
CG-DMR and equivalent upstream and downstream genomic regions divided into 
30 equal-length bins. For DNA methylation, for each bin in each sample the total 
number of methylated/(methylated++unmethylated) reads was determined over 
the whole set of considered CG-DMRs. Final profiles were normalized by dividing 
them by their maximum value. For the H3K27me3 histone modification ChIP-Seq 
reads, RPKM values were determined in each CG-DMR and normalized to the 
average of the upstream/downstream flanking region RPKM values. 

Figure 5b is as in Fig. 4a lower axis, but based on the mC in the CH sequence 
context profiled over the non-CG mega-DMRs and upstream/downstream flank- 
ing regions, minus the non-conversion frequency. The final profiles were normal- 
ized to their maximum level. 

Figure 5c is as in Fig. 4a lower axis, but based on the mC in the CH sequence 
context profiled over non-CG mega-DMRs and upstream/downstream flanking 
regions minus the non-conversion frequency. In the upper axis the H3K9me3 
histone modification ChIP-Seq reads were profiled as described for the 
H3K27me3 profiles in Fig. 4a. 

Figure 5d is as in Fig. 4a lower axis, but based on the mC on the mCG sequence 
context profiled over non-CG mega-DMRs and upstream/downstream flanking 
regions. Profiles were normalized to their maximum levels. 

Figure 5e is as in Fig. 4a lower axis for one example non-CG mega-DMR using 
10-kb bins. 

Clustering of mC profiles and chromosome 10 smoothed profiles. The methy- 
lation level for each C in the CG, CHG and CHH sequence context was summed in 
adjacent 10-kb windows over all autosomal chromosomes. Non-CG DNA methy- 
lation profiles were determined by adding mCHG and mCHH profiles. Clustering 
was performed based on the Pearson correlation over all 10-kb windows trans- 
formed into a distance measure (as 1 — Pearson correlation) and using the hclust R 
function. Data for smoothing of non-CG mC on chromosome 10 were retrieved as 
for the clustering. In addition, smoothing with cubic splines was determined before 
plotting using the smooth.spline R function with spar argument set to 0.3. 
Identification of DMRs. Non-CG mega-DMRs (Fig. 5) were identified by com- 
paring H1 to ADS-iPSC mCHG and mCHH smoothed methylation profiles. The 
average methylation level of mC called (1% FDR) in the mCHG and mCHH 
sequence context was determined in 5-kb windows (sW). The genome was 
scanned considering groups of 10 adjacent windows sW over a distance less than 
50 kb. The set of 10 smoothed values for mCHG in the H1 sample was compared 
to the set of set of 10 smoothed values in the ADS-iPSC sample using the 
Wilcoxon test. For both sets, at least 4 non-missing data points (that is, with 
sequence coverage) were required. Resulting P values were corrected with the 
Benjamini-Hochberg method. Regions with P value < 0.01 (1% FDR) and 8-fold 
enrichment of methylation level were identified, and regions closer than 100 bp 
were joined. This was repeated for the mC in the CHH sequence context. Lastly, 
mCHG and mCHH DMRs overlapping or closer than 100 kb were joined and the 
final set of regions was checked for having mCHG+mCHH fold enrichment of at 
least 2-fold between H1 and ADS-iPSCs. This set of 78 DMRs hypomethylated in 
ADS-iPSCs (Supplementary Fig. 15c—f) was further refined, considering the size 
and overlap with repressive histone marks. The final set of 22 regions reported in 
Fig. 5 includes all the DMRs larger than 1 Mb (17) and a range of smaller ones. 
Also, the 22 final non-CG mega-DMRs encompass ~92% of the initial set of 78 
DMRs, based on size in bp. 

CG-DMRs (Fig. 3) were identified similarly to non-CG mega-DMRs. Smoothed 
average methylation level was performed in 100-bp windows sW, and regions 
comprising a set of 10 adjacent windows sW over a distance less than 1,100 bp 
were considered. The Kruskall-Wallis test was used to score each region based on 
the methylation levels from the two ES cell and five iPSC lines. Regions with 
corrected P value < 0.01 (1% FDR) and 4-fold enrichment of methylation level 
(max/min over the 7 cell lines for each region) were identified, and regions closer 
than 100 bp were joined, resulting in a final set of 1,175 CG-DMRs. Regarding the 
H1 versus H9 comparison, the non parametric Wilcoxon test was applied: at 1% 
FDR and minimum 4-fold enrichment no CG-DMRs could be identified, while 
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only at 10% FDR and 4-fold enrichment could H1 versus H9 CG-DMRs be 
identified. This 10% FDR set has an overlap of 131 kb with the final set of 1,175 
CG-DMRs. For these reasons the set of DMRs that visually appear different 
between H1 and H9 in the Fig. 3 heatmap are either above the 1% FDR threshold 
(H1 versus H9) or with insufficient sequence coverage in one of the two samples. 
(Regions without sequence coverage are not indicated in the heatmap, but are 
considered in the DMR selection. White spots in the heatmap are indicative of 
missing mCG; this can be due to either lack of sequence coverage or sufficient 
coverage and absence of mCG.) These regions are included in the list of the 1,175 
CG-DRMs at the 1% FDR level based on inclusion of the iPSC data. 

For the analysis of CG island reprogramming, the CG-DMRs were identified as 
for the Fig. 3 CG-DMRs (minimum enrichment 2-fold) but including the IMR90, 
ADS-adipose, ADS and FF differentiated cell lines in addition to the two ES cell 
and the five iPSC lines. 

CG island reprogramming analysis was carried out as follows. CG-DMRs dif- 
ferent between ES cells and differentiated cells were defined within the set of CG- 
DMRs identified comparing all analysed methylomes (see earlier), considering 
only CG-DMRs overlapping with CG islands. In particular, for each of these 
CG-DMR the mCG/bp levels in 20 equally sized bins was profiled in all cell types. 
DMRs with pooled mCG/bp levels different from differentiated and ES cell lines 
were identified (Wilcoxon test P value < 0.01, and P value > 0.05 between H1 and 
H9). Similarly, the set of reprogrammed CG-DMRs was identified by comparing 
pooled iPSC mCG profiles with the ES cell samples (Wilcoxon test P value > 0.05). 

CG-DMR reprogramming analysis was carried out as follows. CG-DMRs 
aberrant in iPSCs and like or unlike parental cells were defined within the set of 
1,175 CG-DMRs identified comparing all ES cell and iPSC samples. In particular, for 
each of these CG-DMR the mCG/bp levels in 20 equally sized bins was profiled in all 
cell types. CG-DMRs aberrant in each iPSC line were identified comparing their 
mCG/bp to both H1 and H9 ES cell lines (two-tailed Wilcoxon test P value < 0.05 for 
both, and P value > 0.01 between H1 and H9). Hypermethylated and hypomethy- 
lated CG-DMRs were identified in the same way but using a one-tailed test. Memory 
and iPSC-specific ((DMR) CG-DMRs were identified comparing the mCG/bp den- 
sity between each iPSC and its parental line (Wilcoxon test P value > 0.01 and P 
value < 0.01, respectively). 

Maintained CG-DMRs were identified in the FF 19.11 iPSC line comparing the 
mCG/bp density of H1+BMP4 with both H1 and H9 (one-tailed Wilcoxon test P 
value > 0.01 for both) and FF 19.11 BMP4 to both H1 and H9 (one-tailed 
Wilcoxon test P value < 0.05 for both). 

Identification of PMDs. A sliding window approach was used to find regions of the 
genome that were partially methylated in each cell type, as described previously’*. 
Mapping RNA-Seq reads. RNA-Seq read sequences produced by the Illumina 
analysis pipeline were aligned with the TopHat software*' to the NCBI build 36/ 
hg18 reference sequence. Reads that aligned to multiple positions were discarded. 
Reads per kilobase of transcript per million reads (RPKM) values were calculated 
with the Cufflinks software’ using human RefSeq gene models. 

Mapping and enrichment analysis of ChIP-Seq reads. Following sequencing 
cluster imaging, base calling and mapping were conducted using the Illumina 
pipeline. Clonal reads were removed from the total mapped tags, retaining only 
the non-clonal unique tags that mapped to one location in the genome, where 
each sequence is represented once. Regions of tag enrichment were identified as 
described previously”. 

Data visualization in the AnnoJ browser. MethylC-Seq, RNA-Seq and ChIP- 
Seq sequencing reads and positions of methylcytosines with respect to the NCBI 
build 36/hg18 reference sequence, gene models and functional genomic elements 
were visualized in the AnnoJ 2.0 browser, as described previously’. The data 
mentioned above can be viewed in the AnnoJ browser at http://neomorph.salk. 
edu/ips_methylomes. 
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HDACs link the DNA damage response, 
processing of double-strand breaks and 


autophagy 


Thomas Robert!*, Fabio Vanoli!*, Irene Chiolob**, Ghadeer Shubassi!, Kara A. Bernstein*, Rodney Rothstein’, 
Oronza A. Botrugno*, Dario Parazzoli>, Amanda Oldani”, Saverio Minucci*® & Marco Foiani”® 


Protein acetylation is mediated by histone acetyltransferases (HATs) and deacetylases (HDACs), which influence 
chromatin dynamics, protein turnover and the DNA damage response. ATM and ATR mediate DNA damage 
checkpoints by sensing double-strand breaks and single-strand-DNA-RFA nucleofilaments, respectively. However, 
it is unclear how acetylation modulates the DNA damage response. Here we show that HDAC inhibition/ablation 
specifically counteracts yeast Mecl (orthologue of human ATR) activation, double-strand-break processing and 
single-strand-DNA-RFA nucleofilament formation. Moreover, the recombination protein Sae2 (human CtIP) is 
acetylated and degraded after HDAC inhibition. Two HDACs, Hdal and Rpd3, and one HAT, Gen5, have key roles in 
these processes. We also find that HDAC inhibition triggers Sae2 degradation by promoting autophagy that affects the 
DNA damage sensitivity of hdal and rpd3 mutants. Rapamycin, which stimulates autophagy by inhibiting Tor, also 
causes Sae2 degradation. We propose that Rpd3, Hdal and Gcn5 control chromosome stability by coordinating the 
ATR checkpoint and double-strand-break processing with autophagy. 


HATs and HDACs target histones and non-histone proteins'* and 
regulate chromosome dynamics. They also influence the DNA damage 
response through acetylation of key DNA repair and checkpoint pro- 
teins’. HDACs can be classified into three classes on the basis of 
sequence similarity’. HDAC inhibition is a promising therapeutic 
strategy against cancer®, and certain inhibitors—such as valproic acid 
(VPA)’—affect class I and II HDACs. 

The DNA damage checkpoint response is mediated by two PI3 
kinases, ATR and ATM (Mecl and Tell in yeast, respectively)®. ATR 
is assisted by ATRIP (Ddc2 (also known as Lcd1) in yeast) and, in 
response to DNA damage, activates a signal transduction pathway that 
coordinates cell cycle events with DNA repair and controls apoptosis in 
mammals. In yeast, the Rad53 (CHK2 (also known as CHEK2) in 
humans) protein kinase has a pivotal role in transducing ATR signal- 
ling®. Double-strand breaks (DSBs) are dangerous DNA lesions that can 
be repaired by different recombination processes, depending on the cell 
cycle phase’. In G2, DSBs are processed into single-strand DNA 
(ssDNA) and engaged into homologous recombination-mediated repair 
pathways°”°. Although several DNA repair proteins are acetylated, the 
functional significance of these modifications is mostly unknown. 

Protein acetylation has been also implicated in promoting degra- 
dation of certain proteins through autophagy~'’. Autophagy is a 
highly conserved process involved in protein and organelle turnover 
and results in their vacuolar (lysosomal in mammals) degradation. 
Crosstalk between ubiquitination and autophagy has been 
reported'*’’, Autophagy is triggered by a variety of stimuli, including 
nutrient starvation and TORI inhibitors, some of which are currently 
into clinical trials for cancer therapy’*”*. 

Here we report a connection between the ATR pathway, DSB 
repair, protein acetylation and autophagy. 


VPA counteracts the DNA damage response 

We investigated how HDAC inhibition by VPA affects the DNA 
damage response in budding yeast. VPA treatment per se did not 
activate Rad53 (data not shown), but counteracted Rad53 phosphor- 
ylation after exposure to 4NQO (an ultraviolet-mimetic drug) in G1 
and G2 cells or exposure to MMS (a DNA-alkylating agent) in S phase 
cells (Supplementary Fig. 1a). It is unlikely that VPA limits the accu- 
mulation of checkpoint activators, because inhibition of protein syn- 
thesis did not influence Rad53 phosphorylation (Supplementary Fig. 
1b). As cycloheximide treatment did not restore checkpoint activa- 
tion, it is also unlikely that VPA enhanced negative checkpoint reg- 
ulators (Supplementary Fig. 1b). We then analysed the effect of VPA 
in cells experiencing a single and irreparable DSB at a specific 
chromosomal locus. We overexpressed HO, a yeast nuclease that 
recognizes a specific DNA sequence” (Fig. la-c and Supplementary 
Fig. 2). Checkpoint activation after DSB formation requires Cdc28 
(CDK1 in mammals) activity and DSB resection, which generates 
RPA-ssDNA nucleofilaments and recruitment of the Ddc2-Mecl 
complex*”*. Ten kilobases of ssDNA must accumulate to trigger 
Rad53 activation, which occurs 90 min after DSB formation'”. After 
HO induction in G2, VPA counteracted Rad53 phosphorylation 
(Fig. la). VPA also affected Mecl-dependent Ddc2 and Srs2 phos- 
phorylation*’*. We next measured DSB resection at three loci (0.2, 1.6 
and 5.7 kb from the break site) (Fig. 1b and Supplementary Fig. 2a). 
Resection rates were reduced compared to untreated conditions. 
Without VPA, resection of the 5.7-kb fragment was obvious at 
120 min, whereas with VPA resection was still impaired after 
300 min. Hence, VPA-treated cells failed to accumulate the 10 kb of 
ssDNA needed for Rad53 activation. We then measured the recruit- 
ment of Rfal and Ddc2 to the DSB region with or without VPA by 
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Figure 1 | VPA treatment counteracts DNA double-strand-break 
processing. a—c, RFA1::FLAG DDC2::MYC cells were arrested in G2 and 
released in YP galactose to induce HO endonuclease. After 30 min, the culture 
was split in two: +VPA and — VPA. a, Samples were processed for western blot 
using anti-Rad53, Srs2 and Ddc2 antibodies. b, A schematic diagram showing 


chromatin immunoprecipitation (ChIP) analysis (Fig. 1c and Sup- 
plementary Fig. 2b). VPA counteracted Rfal and Ddc2 recruitment 
at the 0.2-kb and 1.6-kb fragments, indicating that DSB processing 
and signalling are crippled in VPA. As DSB resection is a key step in 
homologous recombination, VPA should also impair recombination 
frequencies. Spontaneous ectopic recombination frequencies’” were 
indeed reduced by VPA treatment (Fig. 1d). Hence, VPA counteracts 
DSB resection and signalling, thus affecting homologous recombina- 
tion and the signal transduction response mediated by Mec1. 


Sae2 and Exol are degraded in VPA-treated cells 


Next we analysed the early events mediating DSB processing (Fig. 2). 
Mre11 is the first factor recruited to a DSB to activate Tell*°. Mrel1 
indirectly influences DSB resection and its removal from the DSB 
region depends on Sae2, a CDK] target involved in DSB processing”””". 
Exol, Dna2 and Sgs1 (BLM in human cells), are also implicated in 
resection” **. We reasoned that VPA could limit the recruitment of 
Mre11 at DSBs**. The timing of Mre11 loading at the 0.2-kb fragment 
was comparable with or without VPA, but Mre11 association persisted 
in VPA-treated cells (Fig. 2a and Supplementary Fig. 2c). Hence, VPA 
does not counteract DSB processing by preventing Mrel1 recruitment. 

VPA treatment affected Sae2 and Exol protein levels. After 180 min 
of HO induction in VPA, Sae2 and Exol were barely detectable 
whereas Mre11 was not affected (Fig. 2b). Hence, VPA affects Sae2 
and Exol turnover, although with different kinetics as the decrease in 
Exol level was delayed compared to Sae2 (Fig. 2b and data not shown). 
These results account for the VPA-dependent accumulation of Mre11 
because Sae2 is needed for Mre11 displacement at the DSB region’**”*, 
but may also explain the VPA-induced DSB resection defect as both 
Sae2 and Exol influence DSB resection. 


VPA stimulates autophagy 

We tested whether VPA induces autophagy in yeast as in mammals”. 
Autophagy induction correlates with: (1) vacuolar staining of Cherry- 
Apel, an aminopeptidase specific for the CVT (cytoplasm to vacuole 
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probe locations with respect to the HO cut site. The resection rate was calculated 
as the rate of HO cut band disappearance. c, Fold enrichment of the 0.2-kb 
fragment was calculated after ChIP of Rfal—Flag, Ddc2-Myc. d, VPA effect on 
ectopic recombination (rec.) in wild-type (WT) cells. Error bars represent 
standard deviation (s.d.) calculated from four independent experiments. 


targeting) subpathway, and perivacuolar foci and vacuolar staining of 
GFP-Atg8, an autophagosome component”; (2) increased enzymatic 
activity of Pho8A60, an autophagy marker**; and (3) processing of 
GFP-Atgs”, 

Yeast cells grown in YPD medium (Fig. 3a) showed mostly Cherry- 
Apel foci but very little Cherry vacuolar staining. The foci may reflect 
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Figure 2 | VPA affects Sae2 and Exol but not Mrel11 protein levels. 

a, MRE11::MYC cells were treated as in Fig. 1a. Cell samples were processed for 
ChIP analysis and the fold enrichment of the 0.2-kb fragment after ChIP of Mrel1— 
Myc without (—VPA) or with (+VPA) VPA was calculated. b, EXO1::FLAG 
SAE2::PK MRE11::MYC cells were grown as in a. Cell samples were taken and 
processed for western blot analysis using anti-Flag, PK and Myc antibodies. 
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Apel oligomer formation after synthesis’. Under conditions of nitro- 
gen starvation (SD-N medium), cells exhibiting fluorescent vacuolar 
staining increased, whereas the foci diminished. These results reflect 
starvation-induced autophagy”. VPA treatment partially mimicked 
starvation conditions. YPD cells showed few GFP-Atg8 foci and little 
vacuolar staining. Conversely, starved cells expressed GFP—Atg8 foci 
and showed vacuolar staining. This trend was recapitulated in VPA. 
We then measured the Pho8A60 activity (Fig. 3b) in wild-type cells 
and in mutants in ATG1, encoding an essential autophagy kinase*®. 
YPD wild-type and atg1 mutants exhibited basal Pho8A60 activity. 
Under starvation or in VPA, Pho8A60 activity increased in wild-type 
cells but not in atg] mutants. We then analysed GFP—Atg8 processing 
(Fig. 3c). Whereas YPD wild-type cells did not undergo GFP—Atg8 
cleavage, starved and VPA cells exhibited an Atg1-dependent GFP- 
Atg8 processing. These results indicate that VPA induces autophagy. 


VPA-induced Sae2 acetylation and degradation 

Next we tested whether Sae2 and/or Exol were acetylated. We immu- 
noprecipitated overexpressed HA-Sae?2 in cells + VPA with anti-HA 
and subsequently + anti-acetyl-Lysine antibodies. Recovered Sae2 
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increased in VPA (Fig. 4a), indicating that Sae2 is acetylated. We 
failed to detect acetylated Exol, although acetylated Exol has been 
previously described*. Autophagy is the preferred pathway for the 
degradation of oligomeric complexes that cannot be recognized by 
the ubiquitin-proteasome system and form toxic aggregates'**’. 
Certain proteins are specifically shunted into the autophagic pathway 
when they are hyperacetylated’’. Hence, Sae2 might be degraded 
through autophagy, as its level declines in VPA, it is acetylated and 
forms complexes”. We tested whether Sae2 disappearance in VPA 
was dependent on autophagy (Fig. 4b-d). Phenylmethylsulphonyl 
fluoride (PMSF) inhibits serine proteases but also blocks autophagy 
by inhibiting vacuolar proteases**. PMSF treatment counteracted Sae2 
disappearance in VPA (Fig. 4b). We also tested whether genetic inac- 
tivation of autophagy would affect Sae2 levels. Deletion of ATG1 or 
ATG19 (specific for the CVT pathway)” partially compensated Sae2 
destabilization in VPA (Fig. 4c). Finally, rapamycin, which induces 
autophagy by inhibiting Tor1*, also destabilized Sae2 in an ATG1- 
dependent manner (Fig. 4d and data not shown). Thus, in VPA, 
autophagy contributes to Sae2 degradation perhaps through acetyla- 
tion as has been reported for the Huntingtin protein”. 


Conditions Total cell number Cherry foci Cherry vacuoles GFP foci GFP vacuoles 
3h YPD 88 51 (58%) 2 (2.3%) 5 (6.7%) 2 (2.3%) 
3h SD-N 85 23 (27.1%) 49 (57.6%) 52 (61.2%) 32 (37.6%) 
3h VPA 123 35 (28.5%) 47 (38.2%) 51 (41.5%) 44 (35.8%) 
100) © 3h YPD b £ 
3h SD-N 2 4 
2 80) 3h VPA 9 mWT 
7 3 atg1A 
a a 3 
© 60 8 
o T fre 
> o 
= 40 £2 
8 a r 
©: T 
ao 20 | g 1 [ 
£ 
o-+ S 4 i 3 0 
Roa Ra P oe : ee yO eh e x» 
er ce SF 
c 
YPD SD-N YPD+VPA YPD SD-N YPD+VPA 


0 3060 120 180 240 30 60 120 180 240 30 60 120 180 240 
— ~esenes—- .~ oe oa — ees 


ed 


wea eS 
WT 


Figure 3 | GFP-Atg8 Cherry-Apel cellular distributions in VPA-treated 
cells. a, Cherry::APE1 GFP::ATG8 cells were grown and shifted to YPD, 
nitrogen starvation (SD-N) or YPD+VPA medium for 3 h. Samples were 
processed for microscopy. The table shows numbers corresponding to the 
experiment. Percentage of fluorescence signals is presented and error bars 
represent the s.d. obtained from three independent experiments. DIC, 
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differential interference contrast. Scale bars, 3 um. b, pho8460 and pho8&460 
atg1A cells were grown as in a. Pho8A60 activity was calculated by measuring 
alkaline phosphatase levels. Error bars represent s.d. calculated from five 
independent experiments. c, GFP::ATG8 and GFP::ATG8 atg1J cells were 
grown as in a. Cell samples were processed for western blot using anti-GFP 
antibody. Quantification is presented in Supplementary Fig. 3. 
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Figure 4 | Sae2 in VPA-treated cells. a, 4HA-Sae2 was immunoprecipitated 
+ VPA with anti-HA and subsequently + anti-acetyl-Lysine. Eluate was 
analysed using anti-HA. Lane 1: input Sae2; 2: input Sae2-HA — VPA; 3: as in 2 
but +VPA; 4: 3 pl elution Sae2-HA —VPA after anti-HA 
immunoprecipitation (IP; input AcK-IP); 5: double amount of 4; 6: IP anti-AcK 
elution from anti-HA IP of Sae2—HA — VPA; 7: as in 6 but +VPA. b, SAE2::PK 
erg6A cells were treated as in Fig. 2a. After 30 min induction, VPA and PMSF 
were added or not. Samples were processed for western blot using anti-PK. 

c, Wild-type SAE2::PK, SAE2::PK atg1A and SAE2::PK atg19A cells were grown 
as in b. After 30 min induction, VPA was added or not and samples treated as in 
b. d, Wild-type SAE2::PK cells were grown as in b. After 120 min induction, 
rapamycin (200 ng ml’) was added or not and samples were treated as in b. 


Gcn5, Rpd3 and Hdal control Sae2 levels 


We tested whether VPA-induced phenotypes can be recapitulated in 
rpd3 and hda1 mutants, altered in two class I and II HDACs, respec- 
tively*’. rpd3 hda1 double mutants showed hypersensitivity to 4NQO 
and hydroxyurea (HU; a DNA synthesis inhibitor) compared to single 
mutants and wild-type cells, whereas only rpd3 cells were hypersensi- 
tive to MMS (Fig. 5a). Damage-induced recombination was reduced 
in hdal rpd3 mutants (data not shown). Hence, Rpd3 and Hdal may 
partially substitute for each other to respond to DNA damage and to 
assist homologous recombination, and may be targeted by VPA. We 
analysed Rad53 phosphorylation in G1 or G2 arrested wild-type, 
hdal, rpd3 and hdal rpd3 cells treated with 4NQO, MMS and HU 
and found that the double mutants failed to promote robust check- 
point activation (data not shown). Thus, both Hdal and Rpd3 influ- 
ence Rad53 activation. In response to DSB formation, rpd3 and hda1 
rpd3 cells showed a severe and equivalent defect in Rad53 phosphor- 
ylation (Fig. 5b), and rpd3 cells were resection defective although less 
than double mutants. Hence, Hdal and Rpd3 influence DSB proces- 
sing and signalling, although to a different extent. Sae2 failed to accu- 
mulate at wild-type levels in hdal rpd3 cells after HO induction in G2 
(Fig. 5c). Thus, the impairment in checkpoint activation and DSB 
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Figure 5 | Gcn5, Rpd3 and Hdal influence Sae2 levels and cell survival in 
atg1 mutants in response to DNA damage. a, Survival of wild-type, rpd3A, 
hdalA and rpd3A hdalA strains after 4NQO, MMS and HU treatment. Error 
bars represent s.d. calculated from seven independent experiments. b, HO was 
induced in G2 wild-type, hda1A, rpd3A and hda1A rpd3A cells and western and 
Southern blot analyses were performed. c, Wild-type SAE2::PK and hdalA 
rpd3A SAE2::PK cells were grown as in b and western blot was performed as in 
Fig. 4b. d, e, Wild-type SAE2::PK and gcn5A SAE2::PK strains were grown as in 
b and after 30 min of HO induction either VPA (d) or rapamycin (e) was added 
or not. Western blot was performed. f, Percentage of viability of wild-type, 
atg1A, rpd3A hdalA and atg1A rpd3A hdal1A strains. Error bars represent s.d. 
calculated from four independent experiments. 


processing and the Sae2 instability observed in VPA can be recapitu- 
lated in rpd3 hdal mutants. The action of Rpd3 and Hdal is counter- 
acted by the Gcn5 HAT (SAGA in mammals)**. We found that, in the 
absence of Gcn5, VPA and rapamycin-mediated destabilization of 
Sae2 were attenuated (Fig. 5d, e). 

The observations described earlier lead to the expectation that 
autophagy might contribute to the hdal rpd3 sensitivity to DNA 
damaging agents. atg! and wild-type cells exhibited comparable sur- 
vival rates in response to camptothecin (CPT) treatment whereas 
hdal rpd3 cells exhibited hypersensitivity to CPT (Fig. 5f). ATG 
ablation partially counteracted CPT-induced lethality in hdal rpd3 
cells. Hence, in hdal rpd3 mutants, the Atgl-mediated autophagic 
response contributes to cell lethality, perhaps by destabilizing key 
DNA damage response factors. 


Discussion 


We showed that class I and II HDACs influence the DNA damage 
response at three levels (Supplementary Fig. 4a): checkpoint activation 
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throughout the cell cycle, DSB processing in G2/M and degradation of 
key recombination protein(s). The following considerations indicate 
that HDACs mediate a global DNA damage response. Firstly, besides 
Sae2, Cdk1, Ku, MRN, Blm’ and Rfal (our unpublished observations) 
are also acetylated. Secondly, Sae2 acetylation might influence its own 
accumulation, thus resembling certain phenotypes of sae2A cells such 
as the inability to remove Mrel1 from the DSB site’. However, in 
contrast to sae2 mutants, checkpoint activation and DSB resection 
are markedly impaired by HDAC inhibition, indicating that Sae2 is 
not the only relevant target. Accordingly, also Exol is degraded in 
VPA, perhaps as a consequence of Sae2 destabilization. Thirdly, the 
DSB resection defects caused by HDAC inhibition might explain the 
lack of checkpoint signals (RPA filaments) in G2/M but not the check- 
point impairment in Gl and G2 cells treated with the ultraviolet- 
mimetic drug 4NQO*, as 4NQO-induced checkpoint signalling does 
not require DSB resection. Because all the damaging agents used lead to 
the accumulation of RPA filaments through different mechanisms, 
perhaps Rfal acetylation also influences checkpoint signalling. 

The Tip60 HAT positively influences ATM”. This observation 
seems at odds with the findings that class I and II HDACs stimulate 
ATR. It is possible that HATs and HDACs have both positive and 
negative roles depending on the checkpoint subpathway. However, as 
ATR activation depends on the processing of DSBs (that represent 
ATM signals), it is possible that HAT-mediated ATM activation is a 
non-direct consequence of ATR inhibition. We note that after HDAC 
inhibition ATM/Tell-mediated histone H2A phosphorylation is not 
affected (data not shown), probably because Mrel1 maintains Tell 
active by remaining loaded at the DSB. Moreover, the fact that Gcn5/ 
SAGA promotes Sae2 degradation implies that this HAT indirectly 
negatively influences ATR signalling, perhaps by generating acety- 
lated substrates for the autophagic pathway and/or by directly pro- 
moting autophagy. 

Histone 3 lysine 9, 14, 18, 23 (converted to glycines) and histone 4 
lysine 5, 8, 12, 16 (converted to arginines) mutants, altered in H3 and 
H4 acetylation, still activate the checkpoint (Supplementary Fig. 1c), 
thus indicating that H3 and H4 histone acetylation does not have a 
relevant role for checkpoint activation. 

HDAC inhibition induces autophagy through unknown processes 
and we show that HDAC impairment destabilizes Sae2 through an 
autophagic pathway. However, the magnitude of autophagy induction 
after HDAC inhibition is not as strong as that in nitrogen-starved cells. 
Torl counteracts autophagy™ and we showed that rapamycin affects 
Sae2 turnover. The finding that Gcn5/SAGA ablation counteracts 
Sae2 degradation in VPA- and rapamycin-treated cells pinpoints 
the HAT activity involved in this regulatory process. Intriguingly, 
gcn5 mutants are sensitive to rapamycin. Moreover, dna2 mutants 
are altered in DSB resection and require TORI overexpression” for 
suppression. A tantalizing hypothesis is that an excess amount of Tor] 
rescues dna2 mutants by counteracting authophagy-mediated Sae2 
destabilization. 

We propose that (Supplementary Fig. 4b) after DSB formation, the 
broken chromosome arm is relocated close to the nuclear envelope*’. 
Rpd3 and Hda1 will keep Sae2 in the deacetylated form that influences 
Mre11 dynamics at the DSB site’’. Sae2 is then released from the DSB 
site, perhaps as a multimeric form** and Gcn5-mediated acetylation 
shunts it into autophagy-mediated degradation. This last step might 
be needed to counteract extensive DSB resection and/or simply to 
eliminate Sae2 once the first step of DSB processing has been accomp- 
lished; we note that exposure to reactive oxygen species, ultraviolet and 
ionizing radiation, besides damaging DNA, can cause protein damage 
and protein-DNA crosslinking”, and certain damaged repair proteins 
or crosslinked proteins might have to be destroyed to prevent cellular 
problems. Cells may use specific autophagy subpathways rather than 
recycle all the cellular components, including those that are not 
damaged. This notion is supported by the observation that Sae2 degra- 
dation depends on Atg19, a factor specific for certain types of selective 
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autophagy. In any case, triggering unprogrammed autophagy-mediated 
turnover of key repair proteins, either by inhibiting HDACs and/or 
Torl would contribute to DNA damaging sensitivity. CtIP is ubiqui- 
tinated* and Sae?2 levels under normal conditions increase after proteo- 
some inhibition with MG132 (data not shown). These observations, 
besides indicating that Sae2 also undergoes ubiquitination, may 
account for the partial rescue of Sae2 levels in atg mutants. Crosstalk 
between ubiquitination and autophagy has previously been described". 
Future work will address the contribution of both pathways to DSB 
metabolism. 


METHODS SUMMARY 


Strains are listed in Supplementary Table 1. Growth conditions, synchronization 
and HO induction have previously been described’. VPA was used at 10mM 
unless otherwise indicated. Nitrogen starvation (SD-N) medium, viability and 
recombination analysis have previously been described’’”*. Error bars represent 
s.d. calculated from at least three independent experiments. FACS analysis, TCA 
extraction and SDS-PAGE have previously been described“. For immunodetec- 
tion of Rad53 we used EL7 and F9 antibodies”. For Myc, HA, PK, PGK1, Flag and 
GFP we used the 9E10, 12CA5, V5-TAG, 22C5 and M2 antibodies, respectively. 
Protein quantification was normalized with respect to PGK1 and accumulation 
was calculated as the ratio of VPA- or rapamycin-treated to untreated cells. 
Resection experiments were previously described'®. Purified genomic DNA was 
digested with Styl (0.2-, 1.6-kb fragments) or NcoI (5.7kb) and treated for 
Southern blot analysis. The density of the HO-cut band at t= 30 min (Sup- 
plementary Fig. 2a) was set to 100%. The total amount of DNA loaded in each 
sample was normalized by re-probing the blots with probe Control D located 
170 kb from the HO site. ChIP analysis was previously described**. The fold 
enrichment of fragments located 0.2 kb and 1.6 kb from the DSB was calculated 
as the ratio between the value of the fragment of interest and the value of the 
fragment used as control (ARS305). The number obtained was divided for the 
same ratio calculated for the whole cell extract samples of each time point. 
Primers for resection and ChIP experiments are the same as used previously”. 
Samples for microscopic analysis were fixed in 4% formaldehyde for 5 min at 
room temperature (21 °C) and washed in cold 1X PBS. Images were taken with an 
Olympus BX51 fluorescent microscope. Oil immersion 100 objective UPlan 
APO, NA 1.4 was used. We used an 800 ms exposure time for Cherry and 400 ms 
for GFP. Alkaline phosphatase activity was measured using the Pho8A60 assay as 
described”*. 
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The unusual minimum of sunspot cycle 23 caused by 
meridional plasma flow variations 


Dibyendu Nandy’, Andrés Mufioz-Jaramillo** & Petrus C. H. Martens”? 


Direct observations over the past four centuries’ show that the 
number of sunspots observed on the Sun’s surface varies periodic- 
ally, going through successive maxima and minima. Following 
sunspot cycle 23, the Sun went into a prolonged minimum char- 
acterized by a very weak polar magnetic field** and an unusually 
large number of days without sunspots*. Sunspots are strongly 
magnetized regions’ generated by a dynamo mechanism* that 
recreates the solar polar field mediated through plasma flows’. 
Here we report results from kinematic dynamo simulations which 
demonstrate that a fast meridional flow in the first half of a cycle, 
followed by a slower flow in the second half, reproduces both char- 
acteristics of the minimum of sunspot cycle 23. Our model predicts 
that, in general, very deep minima are associated with weak polar 
fields. Sunspots govern the solar radiative energy*” and radio flux, 
and, in conjunction with the polar field, modulate the solar wind, 
the heliospheric open flux and, consequently, the cosmic ray flux at 
Earth??°", 

The creation and emergence of tilted, bipolar sunspot pairs and their 
subsequent decay and dispersal through flux transport processes deter- 
mine the properties of the solar magnetic cycle®’*’”. The average tilt 
angle of the sunspots of cycle 23 did not differ significantly from earlier 
cycles’. However, the axisymmetric meridional circulation of plasma’*— 
which is observationally constrained only in the upper 10% of the Sun, 
where it has an average poleward speed of 20ms~ '—is known to have 
significant intra- and intercycle variation’? **. The equatorward counter- 
flow of this circulation in the solar interior is believed to have a crucial 
role; it governs the equatorward migration and spatiotemporal distri- 
bution of sunspots and determines the solar cycle period®**”. We per- 
form kinematic solar dynamo simulations to investigate whether 
internal meridional flow variations can produce deep minima between 
cycles in general, and, in particular, explain the observed characteristics 
of the minimum of cycle 23 (Supplementary Information)—a compara- 
tively weak dipolar field strength and an unusually long period without 
sunspots. 

We use a recently developed axisymmetric, kinematic solar dynamo 
model” to solve the evolution equations for the toroidal and poloidal 
components of the magnetic field. This model has been further refined 
using a buoyancy algorithm that incorporates a realistic representation 
of bipolar sunspot eruptions following the double-ring formalism**”* 
and qualitatively captures the surface flux transport dynamics leading to 
solar polar field reversal’ (including the observed evolution of the radial 
component of the Sun’s dipolar field). To explore the effect of changing 
meridional flows on the nature of solar minima, it is necessary to intro- 
duce fluctuations in the meridional flow. The large-scale meridional 
circulation in the solar interior is believed to be driven by Reynolds 
stresses and small temperature differences between the solar equator 
and poles; variations in the flows may be induced by changes in the 
driving forces or through the feedback of magnetic fields”’. The feedback 
is expected to be highest at the solar maximum (polar field minimum), 
when the toroidal magnetic field in the solar interior is the strongest. 
We therefore perform dynamo simulations by randomly varying the 


meridional flow speed between 15 and 30 ms’! (with the same ampli- 
tude in both the hemispheres) at the solar cycle maximum, and study 
its effect on the nature of solar cycle minima. Details of the dynamo 
model are described in Supplementary Information. 

Our simulations extend over 210 sunspot cycles corresponding to 
1,860 solar years; for each of these simulated cycles, we record the 
meridional circulation speed, the cycle overlap (which includes the 
information on the number of days with no sunspots) and the strength 
of the polar radial field at cycle minimum. Figure 1 shows the sunspot 
butterfly diagram and surface radial field evolution over a selected 40-yr 
slice of the simulation. Here cycle to cycle variations (mediated by 
varying meridional flows) in the strength of the polar field at minimum 
and the structure of the sunspot butterfly diagram are apparent, hinting 
that the number of spotless days during a minimum is governed by the 
overlap (or lack thereof) of successive cycles. 

We designate the minimum in activity following a given sunspot 
cycle, say n, as the minimum of n (because the sunspot eruptions from 
cycle n contribute to the nature of this minimum). We denote the 
amplitude of the meridional flow speed after the random change at 
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500 
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Figure 1 | Simulated sunspot butterfly diagram with a variable meridional 
flow. Starting with the pioneering telescopic observations of Galileo Galilei and 
Christopher Scheiner in the early seventeenth century, sunspots have been 
observed more or less continuously up to the present. Except for the period AD 
1645-1715, known as the Maunder minimum, when hardly any sunspots were 
observed, the sunspot time series shows a cyclic variation going through 
successive epochs of maximum and minimum activity. This cyclic temporal 
variation in the latitude of sunspot emergence gives rise to the ‘butterfly’ 
diagram. In this simulated butterfly diagram, the green line shows the 
meridional flow speed, v,, which is made to vary randomly between 15 and 
30ms | at sunspot maxima and to remain constant between maxima. The 
varying meridional flow induces cycle-to-cycle variations in both the amplitude 
as well as the distribution of the toroidal field in the solar interior from which 
bipolar sunspot pairs buoyantly erupt. This variation is reflected in the 
spatiotemporal distribution of sunspots, shown here as shaded regions (the 
lighter shade represents sunspots that have erupted from positive toroidal field 
and the darker shade represents those that have erupted from negative toroidal 
field). The sunspot butterfly diagram shows a varying degree of cycle overlap (of 
the ‘wings’ of successive cycles) at cycle minimum. The polar radial field 
strength (yellow, positive; blue, negative) is strongest at sunspot cycle minimum 
and varies significantly from one cycle minimum to another. 
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the maximum of cycle n by v,, which remains constant through the 
minimum of cycle and changes again at the maximum of cyclen + 1. 
According to this convention, the speed during the first (rising) half of 
cycle n would be v,,_ ,. To explore the relationship between the varying 
meridional flow, the polar field strength and cycle overlap, we generate 
statistical correlations between these quantities separately for the 
northern and southern solar hemispheres from our simulations over 
210 sunspot cycles. We note that slight hemispheric asymmetries arise 
in the simulations owing to the stochastic nature of the active-region 
emergence process. 

Unexpectedly, we find that there is no correlation between the flow 
speed at a given minimum (say v,,) and cycle overlap (or the number of 
spotless days) during that minimum, and the polar field strength at that 
minimum, B,, is only moderately correlated with v,, (Fig. 2a, b). Because 
transport of magnetic flux by the meridional flow involves a finite time, 
it is likely that the characteristics of a given minimum could depend on 
the flow speed at an earlier time. We find that this is indeed the case 
(Fig. 2c, d), with cycle overlap (or the number of spotless days) and the 


LETTER 


polar field strength at a given minimum, n, being strongly correlated 
with the flow speed v,_, (that is, meridional flow during the early, 
rising, part of cycle n). We also find that the cycle overlap is moderately 
correlated and that the polar field strength is strongly correlated with 
the change in flow speed between the first and second halves of the cycle 
(Fig. 2e, f). Taken together, these results show that a fast flow during the 
early part of the cycle, followed by a relatively slower flow during the 
later, declining, part of the cycle, results in a deep solar minimum. 

The main characteristics of the minimum of solar cycle 23 are a large 
number of spotless days and a relatively weak polar field strength. In 
Fig. 3, we plot the polar field versus cycle overlap and find that very deep 
minima are in fact associated with relatively weak polar field strengths. 
Thus, the qualitative characteristics of the unusual minimum of sunspot 
cycle 23 are self-consistently explained in our simulations driven by 
changes in the Sun’s meridional plasma flow. Our model predicts that, 
in general, extremely deep solar minima—with a large number of spot- 
less days—would also be characterized by relatively weak solar polar 
field strengths. 
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Figure 3 | Polar field strength versus cycle overlap at solar minimum. 
Simulated normalized polar field strength is plotted versus cycle overlap at 
sunspot cycle minimum. Spearman’s rank correlation estimate: r = 0.46, 0.47 
and P = 99.99%, 99.99% for data from the northern (crosses) and southern 
(circles) hemispheres, respectively. The results show that a deep solar minimum 
with a large number of spotless days is typically associated with a relatively weak 
polar field—as observed during the minimum of sunspot cycle 23. 


We find that our model results are robust with respect to reasonable 
changes in the driving parameters. Simulations with continuous flow 
variations (as opposed to discrete changes), relatively higher magnetic 
diffusivity and a different threshold for buoyant active-region eruption 
all yield qualitatively similar relationships between the nature of solar 
minima and flow speed variations (Supplementary Information). 

Valuable insights into our simulation results may be gained by 
invoking the physics of meridional-flow-mediated magnetic flux 
transport. A faster flow (v,,_,) before and during the first half of cycle 
n would sweep the poloidal field of the previous cycle quickly through 
the region of differential rotation responsible for toroidal field induc- 
tion; this would allow less time for toroidal field amplification and 
would hence result in a sunspot cycle (m) which is not too strong. 
The fast flow, followed by a slower flow during the second half of cycle 
n and persisting to the early part of the next cycle, would also distance 
the two successive cycles (that is, successive wings in the sunspot 
butterfly diagram), contributing to a higher number of spotless days 
during the intervening minimum. Moreover, a strong flow during the 
early half of cycle n would sweep both the positive and the negative 
polarity sunspots of cycle n (erupting at mid to high latitudes) to the 
polar regions; therefore, lower net flux would be available for cancelling 
the polar field of the old cycle and building the field of the new cycle— 
resulting in a relatively weak polar field strength at the minimum of 
cycle n. We believe that a combination of these effects contributes to 
the occurrence of deep minima such as that of solar cycle 23. 

Independent efforts using surface flux transport simulations show 
that surface meridional flow variations alone (observed during solar 
cycle 23; see also Supplementary Information) are inadequate for repro- 
ducing the weak polar field of cycle 23 (ref. 28). Dynamo simulations— 
which encompass the entire solar convection zone—are therefore 
invaluable for probing the internal processes that govern the dynamics 
of the solar magnetic cycle, including the origin of deep minima such as 
that of cycle 23. We anticipate that NASA’s recently launched Solar 
Dynamics Observatory will provide more precise constraints on the 
structure of the plasma flows deep in the solar interior, which could 
be useful for complementing these simulations. 
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Spin-orbit-coupled Bose-Einstein condensates 


Y.-J. Lin’, K. Jiménez-Garcial? & I. B. Spielman! 


Spin-orbit (SO) coupling—the interaction between a quantum 
particle’s spin and its momentum—is ubiquitous in physical sys- 
tems. In condensed matter systems, SO coupling is crucial for the 
spin-Hall effect’? and topological insulators’; it contributes to 
the electronic properties of materials such as GaAs, and is import- 
ant for spintronic devices. Quantum many-body systems of ultra- 
cold atoms can be precisely controlled experimentally, and would 
therefore seem to provide an ideal platform on which to study SO 
coupling. Although an atom’s intrinsic SO coupling affects its 
electronic structure, it does not lead to coupling between the spin 
and the centre-of-mass motion of the atom. Here, we engineer SO 
coupling (with equal Rashba’ and Dresselhaus*® strengths) in a 
neutral atomic Bose-Einstein condensate by dressing two atomic 
spin states with a pair of lasers’. Such coupling has not been rea- 
lized previously for ultracold atomic gases, or indeed any bosonic 
system. Furthermore, in the presence of the laser coupling, the 
interactions between the two dressed atomic spin states are modi- 
fied, driving a quantum phase transition from a spatially spin- 
mixed state (lasers off) to a phase-separated state (above a critical 
laser intensity). We develop a many-body theory that provides 
quantitative agreement with the observed location of the trans- 
ition. The engineered SO coupling—equally applicable for bosons 
and fermions—sets the stage for the realization of topological insu- 
lators in fermionic neutral atom systems. 

Quantum particles have an internal ‘spin’ angular momentum; this 
can be intrinsic for fundamental particles like electrons, or a combina- 
tion of intrinsic (from nucleons and electrons) and orbital for composite 
particles like atoms. SO coupling links a particle’s spin to its motion, and 
generally occurs for particles moving in static electric fields, such as the 
nuclear field of an atom or the crystal field in a material. The coupling 
results from the Zeeman interaction —p1‘B between a particle’s mag- 
netic moment pt, parallel to the spin 6, and a magnetic field B present in 
the frame moving with the particle. For example, Maxwell’s equations 
dictate that a static electric field E = EoZ in the laboratory frame (at rest) 
gives a magnetic field Bso = Eo(h/mc’) ( _ ky.kx,0) in the frame of an 
object moving with momentum /ik = hi (kxskyskz) , where cis the speed of 
light in vacuum and m is the particle’s mass. The resulting momentum- 
dependent Zeeman interaction —p*Bso(k) <oxky — oyk, is known as 
the Rashba’ SO coupling. In combination with the Dresselhaus® coupling 
x —6,ky — oyk,, these describe two-dimensional SO coupling in solids 
to first order. 

In materials, the SO coupling strengths are generally intrinsic 
properties, which are largely determined by the specific material and 
the details of its growth, and are thus only slightly adjustable in the 
laboratory. We demonstrate SO coupling in an *’Rb Bose-Einstein 
condensate (BEC) where a pair of Raman lasers create a momentum- 
sensitive coupling between two internal atomic states. This SO coupling 
is equivalent to that of an electronic system with equal contributions of 
Rashba and Dresselhaus’ couplings, and with a uniform magnetic field B 
in the y —Z plane, which is described by the single-particle Hamiltonian: 
We 2. 6 
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a parametrizes the SO-coupling strength; Q = —gy,B, and 6 = —gupB, 
result from the Zeeman fields along z and J, respectively; and 6,),, are 
the 2 X 2 Pauli matrices. Without SO coupling, electrons have group 
velocity v, = hk,/m, independent of their spin. With SO coupling, their 
velocity becomes spin-dependent, v, = i(k, + 2am/h’)/m for spin lt) 
and ||) electrons (quantized along ¥). In two recent experiments, this 
form of SO coupling was engineered in GaAs heterostructures where 
confinement into two-dimensional planes linearized the native cubic SO 
coupling of GaAs to produce a Dresselhaus term, and asymmetries in the 
confining potential gave rise to Rashba coupling. In one experiment a 
persistent spin helix was found’, and in another the SO coupling was 
only revealed by adding a Zeeman field”. 

SO coupling for neutral atoms enables a range of exciting experi- 
ments, and importantly, it is essential in the realization of neutral atom 
topological insulators. Topological insulators are novel fermionic band 
insulators including integer quantum Hall states and now spin 
quantum Hall states that insulate in the bulk, but conduct in topo- 
logically protected quantized edge channels. The first-known topo- 
logical insulators—integer quantum Hall states''—require large 
magnetic fields that explicitly break time-reversal symmetry. In a 
seminal paper’, Kane and Mele showed that in some cases SO coupling 
leads to zero-magnetic-field topological insulators that preserve time- 
reversal symmetry. In the absence of the bulk conductance that plagues 
current materials, cold atoms can potentially realize such an insulator 
in its most pristine form, perhaps revealing its quantized edge (in two 
dimensions) or surface (in three dimensions) states. To go beyond the 
form of SO coupling we created, almost any SO coupling, including 
that needed for topological insulators, is possible with additional 
lasers'’*""*, 

To create SO coupling, we select two internal ‘spin’ states from 
within the °’Rb 5Si2, F=1 ground electronic manifold, and label 
them pseudo-spin-up and pseudo-spin-down in analogy with an elec- 
tron’s two spin states: |[)=|F=1, mp=0) and ||)=|F=1, 
Mp=-—1). A pair of 4=804.1nm Raman lasers, intersecting at 
6 = 90° and detuned by 6 from Raman resonance (Fig. 1a), couple 
these states with strength Q; here hk, = J/2nh yh Aand Ey, = h*k?/2m are 
the natural units of momentum and energy. In this configuration, the 
atomic Hamiltonian is given by equation (1), with k, replaced by a 
quasimomentum q and an overall E,, energy offset. Q and 6 give rise 
to effective Zeeman fields along Z and J, respectively. The SO-coupling 
term 2E;,q@, /k;, results from the laser geometry, and « = E,/k, is set by 
Aand 0, independent of Q (see Methods). In contrast with the electronic 
case, the atomic Hamiltonian couples bare atomic states | 1.4. =q+k.) 
and ||,&.,=q—,) with different velocities, Wk. /m=h(q+k,)/m. 

The spectrum, a new energy-quasimomentum dispersion of the SO- 
coupled Hamiltonian, is displayed in Fig. 1b at 6 = 0 and for a range of 
couplings 2. The dispersion is divided into upper and lower branches 
E.(q), and we focus on E_(q). For Q < 4E; and small 6 (see Fig. 2a), 
E_(q) consists of a double well in quasi-momentum”, where the group 
velocity 0E_(q)/Ohq is zero. States near the two minima are dressed 
spin states, labelled as |{') and ||"). As Q increases, the two dressed 
spin states merge into a single minimum and the simple picture of 
two dressed spins is inapplicable. Instead, that strong coupling limit 
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Figure 1 | Scheme for creating SO coupling. a, Level diagram. Two 

A= 804.1 nm lasers (thick lines) coupled states |F = 1, mp = 0) = |[) and 
|F=1, mp= —1) = ||), differing in energy by a Az Zeeman shift. The lasers, 
with frequency difference Aw,/27 = (wz + 6/h)/21m, were detuned 6 from the 
Raman resonance. |mp = 0) and |mp = +1) had a h(az — «,) energy 
difference; because hog = 3.8E, is large, |mp= +1) can be neglected. 

b, Computed dispersion. Eigenenergies at 6 = 0 for Q = 0 (grey) to 5E;. When 
Q<4E, the two minima correspond to the dressed spin states |}’) and ||’). 
c, Measured minima. Quasimomentum q;,, of It's |’) versus Q at 6 = 0, 
corresponding to the minima of E_(q). Each point is averaged over about ten 
experiments; the uncertainties are their standard deviation. d, Spin-momentum 
decomposition. Data for sudden laser turn-off: 6 ~ 0, Q = 2E, (top image pair), 
and 22 = 6E, (bottom image pair). For 2 = 2E,, ||") consists of |t.k.,.~0) and 
||.&.%—2kz,), and ||’) consists of |f,&.,,~2kr) and | |,&.,.~0). 


effectively describes spinless bosons with a tunable dispersion rela- 
tion'® with which we engineered synthetic electric!’ and magnetic 
fields’* for neutral atoms. 

In the absence of Raman coupling, atoms with spins |{) and ||) 
spatially mixed perfectly in a BEC. By increasing 2 we observed an 
abrupt quantum phase transition to a new state where the two dressed 
spins spatially separated, resulting from a modified effective inter- 
action between the dressed spins. 

We studied SO coupling in oblate *’Rb BECs with about 1.8 X 10° 
atoms in a 7 = 1,064-nm crossed dipole trap with frequencies (f,, f,, 
FD = (50, 50, 140) Hz. The bias magnetic field Boy generated a w7/ 
2n ~ 4.81 MHz Zeeman shift between ||) and ||). The Raman beams 
propagated along +X and had a constant frequency difference Aw,/ 
2n ~ 4.81 MHz. The small detuning from the Raman resonance 
6=h(Aw, — wz) was set by Bo, and the state |mp=+1) was 
decoupled owing to the quadratic Zeeman effect (see Methods). 

We prepared BECs with an equal population of |) and ||) at Q, 
6 = 0, then we adiabatically increased Q to a final value up to 7E,, in 
70 ms, and finally we allowed the system to equilibrate for a holding 
time 4, = 70 ms. We abruptly (tog < 1 jus) turned off the Raman lasers 
and the dipole trap—thus projecting the dressed states onto their 
constituent bare spin and momentum states—and absorption-imaged 
them after a 30.1-ms time of flight (TOF). For 2 > 4E, (Fig. 1d), the 
BEC was located at the single minimum qo of E_(q) with a single 
momentum component in each spin state corresponding to the pair 
{|f,4o + kx), |1,qo — kx)}. However, for Q<4E, we observed two 
momentum components in each spin state, corresponding to the 
two minima of E_(q) at q; and q,. The agreement between the data 
(symbols), and the expected minima locations (curves), demonstrates 


84 | NATURE | VOL 471 | 3 MARCH 2011 


. |) 
Single | 
minimum 
ai 6 
w % 
vies 3 
£ 4] FS 
= 
2 io 
=) 
a i 
1 
2 : t |t) 
Raman coupling, Q/E, 
b 


Detuning, 6/E, 


0.2 0.3 0.4 
Raman coupling, 2/E, 


c Phase mixed Phase separated 


Raman 


z coupling 


Figure 2 | Phases of a SO-coupled BEC. a, b, Mean field phase diagrams for 
infinite homogeneous SO-coupled *’Rb BECs (1.5-kHz chemical potential). 
The background colours indicate atom fraction in |{) and ||). Between the 
dashed lines there are two dressed spin states, ||’) and |"). a, Single-particle 
phase diagram in the 2-6 plane. b, Phase diagram (enlargement of the grey 
rectangle in a), as modified by interactions. The dots represent a metastable 
region where the fraction of atoms f;,,, remains largely unchanged for , = 3s. 
c, Miscible-to-immiscible transition. Phase line for mixtures of dressed spins 
and images after TOF (with populations N; ~ N), mapped from |{’) and | |') 
showing the transition from phase-mixed to phase-separated within the 
‘metastable window’ of detuning. 


the existence of the SO coupling associated with the Raman dressing. 
We kept 6 ~ 0 when turning on Q by maintaining equal populations in 
bare spins |1), ||) (see Fig. 1d). 

We experimentally studied the low-temperature phases of these 
interacting SO-coupled bosons as a function of Q and 6. The zero- 
temperature mean-field phase diagram (Fig. 2a, b) includes phases 
composed of a single dressed spin state, a spatial mixture of both 
dressed spin states, and coexisting but spatially phase-separated 
dressed spins. 

This phase diagram can largely be understood as the result of non- 
interacting bosons condensing into the lowest-energy single particle 
state, and can be divided into three regimes (Fig. 2a). In the region of 
positive detuning marked | |’), there are double minima at q = q;,q, in 
E_(q) with E_(q,) < E_(q;) and the bosons condense at q). In the 
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region marked |") the reverse holds. The energy difference between 
the two minima is A(Q, 6) = E_(q;) — E_(q,) ~ 6 for small 6 (see 
Methods). In the third ‘single minimumy’ regime, the atoms condense 
at the single minimum qp. These dressed spins act as free particles with 
group velocity /K,/m (with an effective mass m* ~ m, for small Q), 
where K,. = q — qj,\,0 for the different minima. 

We investigated the phase diagram using BECs with initially equal 
spin populations prepared as described previously, but with 6 ~ 0 and 
t, up to 3s. We probed the atoms after abruptly removing the dipole 
trap, and then ramping 22— 0 in 1.5 ms. This approximately mapped 
|t’) and ||") back to their undressed counterparts ||) and ||) (see 
Methods). We absorption-imaged the atoms after a 30-ms TOF, during 
the last 20 ms of which a Stern-Gerlach magnetic field gradient along y 
separated the spin components. 

Figure 3a shows the condensate fraction f|, = N\//(N,, + Ny) in ||") 
at Q = 0.6E,, as a function of 0, at t, = 0.1 s, 1 s and 3 s, where Nj and 
N,; denote the number of condensed atoms in |{") and ||"), respec- 
tively. The BEC is all |{’) for 6<0 and all ||') for 620, but both 
dressed spin populations substantially coexisted for detunings within 
+we (obtained by fitting f, to the error function where 6 = + ws 
corresponds to f,, = 0.50 + 0.16). Figure 3b shows ws versus Q for 
hold times ty. ws decreases with f,; even by our longest t, of 3s it 
has not reached equilibrium. 

Conventional F = 1 spinor BECs have been studied in **Na and 
*’Rb without Raman coupling’. For our |f) and ||) states, the 
interaction energy depends on the local density in each spin state, 


and is described by: 
a 1 Cc 
fam 3] [(0+$) +A)" +5 (AA) 
where p; and p are density operators for |}) and ||), and normal 
ordering is implied. In the °*’Rb F= 1 manifold, the spin-independent 
interaction is cy=7.79X 10 '*Hzcm’, the spin-dependent inter- 
action” is c)=—3.61X10 *Hzcm*, and c’7,=0. Because 
|co|>>|c2|, the interaction is almost spin-independent, but c, <0, so 
the two-component mixture of ||) and ||) has a spatially mixed ground 
state (is miscible). When Hy is re-expressed in terms of the dressed spin 
states, c’+ |= coQ / (8E7) is non-zero and corresponds to an effective 
interaction between |}’) and ||"). This modifies the ground state of our 
SO-coupled BEC (mixtures of |{") and ||")) from phase-mixed to 
phase-separated above a critical Raman coupling strength (.. This 
transition lies outside the common single-mode approximation”. 
The effective interaction between |{') and ||’) isan exchange energy 
resulting from the non-orthogonal spin part of |{’) and ||") (see 
Methods): a spatial mixture produces total density modulations’® with 
wavevector 2k,, in analogy with the spin-textures of the electronic 
case®. These increase the state-independent interaction energy in Ay 
wherever the two dressed spins spatially overlap, contributing to the 
cy, term. (Such a term does not appear for radio-frequency-dressed 
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Figure 3 | Population relaxation. a, Condensate fraction fj, in | |’) at 

Q = 0.6E, versus detuning 6 at f, = 0.1, 0.5 and 3s showing ws decrease with 
increasing f,. The solid curves are fits to the error function from which we 
obtained the width ws. b, Metastable detuning width. Width ws; versus Q at 
t, = 0.1, 0.5 and 3 s; the data fits well to a[b + (Q/E,) 7] (dashed curves). 
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states, which are always spin-orthogonal.) Because c's, and cz have 
opposite sign here, the dressed BEC can go from miscible to immiscible 
at the miscibility threshold’ for a two-component BEC 
cote cy /2 cCo(co +c), when Q = Q, (this result is in agree- 
ment with an independent theory presented in ref. 23). 

Figure 2b depicts the mean field phase diagram including interac- 
tions, computed by minimizing the interaction energy H; plus the 
single particle detuning A(Q, 6) ~ 6. This phase diagram adds two 
new phases, mixed (hashed) and phase-separated (bold line), to those 


present in the non-interacting case. The c2( 07 —p? /2 term in A, 
implies that the energy difference between a\|{) BEC and a ||) BEC 


is proportional to N’c). The detuning required to compensate for this 
difference slightly displaces the symmetry point of the phase diagram 
downwards. As evidenced by the width of the metastable window 2w; 
in Fig. 2b, for |6| < ws the spin-population does not have time to relax 
to equilibrium. The miscibility condition does not depend on atom 
number, so the phase line in Fig. 2c shows the system’s phases for 
|0| <ws: phase-mixed for Q <Q, and phase-separated for Q>Q, 
where Q.~ \/ — 8c /coEL~0.19E,. 

We measured the miscibility of the dressed spin components from 
their spatial profiles after TOF, for Q = 0 to 2E, and 6 ~ 0 such that 
Nyy ~ Ny), where Nryy’,)’ is the total atom number including both the 
condensed and thermal components in |}’), ||"). For each TOF image, 
we numerically re-centred the Stern—Gerlach-separated spin distribu- 
tions (Fig. 2c, and see Methods), giving condensate densities n;/(x, y) 
and n,/(x, y). Given that the self-similar expansion of BECs released 
from harmonic traps essentially magnifies the in situ spatial spin dis- 
tribution, these reflect the in situ densities”*. i 

A dimensionless metric s = 1—(nyny)/ ((n2.) (x},)) * quantifies 


the degree of phase separation (where (...) is the spatial average over a 
single image). s = 0 for any perfect mixture n;(x, y) « n(x, y), and 
s = 1 for complete phase separation. Figure 4 displays s versus Raman 
coupling Q with a hold time f, = 3 s, showing that s ~ 0 for small Q (as 
expected given our miscible bare spins) and s abruptly increases above 
a critical Q.. The inset to Fig. 4 plots s as a function of time, showing 
that s reaches steady state in 0.14(3) s, which is much less than t,. To 
obtain Q., we fitted the data in Fig. 4 to a slowly increasing function 
below Q. and the power-law 1 — (Q/Q,) “° above Q.. The resulting 
Q,=0.20(2)E, is in agreement with the mean field prediction 
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Figure 4 | Miscible to immiscible phase transition. Phase separation s versus 
Q with t, = 3 s; the solid curve is a fit to the function described in the text. The 
power-law component of the fit has an exponent a = 0.75 + 0.07; this is not a 
critical exponent, but instead results from the decreasing size of the domain wall 
between the regions of ||’) and | |’) as Q increases. Each point represents an 
average over 15 to 50 realizations and the uncertainties are the standard 
deviation. Inset, phase separation s versus f,, with Q = 0.6E, fitted to an 
exponential showing the rapid 0.14(3)-s timescale for phase separation. 
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Q., = 0.19E;. This demonstrates a quantum phase transition for a two- 
component SO-coupled BEC, from miscible when 2 < Q, to immis- 
cible when Q > Q.. 

Even below Q,, s slowly increased with increasing Q.'To understand 
this effect, we numerically solved the two-dimensional spinor Gross— 
Pitaevskii equation in the presence of a trapping potential. This 

(5i—p8) 2 
H, favours slightly different density profiles for each spin component, 
while the (c +c4 1) P,P, term favours matched profiles. Thus, as 
C2 +c';| approached zero from below this balancing effect decreased, 
causing s to increase. 

An infinite system should fully phase separate (s = 1) for all Q > Q.. 
In our finite system, the boundary between the phase-separated spins, 
set by the spin-healing length (€, = ,//" j 2m|co +c’4||n, where nis the 
local density), can be comparable to the system size. We interpret the 
increase of s above Q, as resulting from the decrease of €, with increas- 
ing Q. 

We realized SO coupling in an *’Rb BEC, and observed a quantum 
phase transition from spatially mixed to spatially separated. By oper- 
ating at lower magnetic field (with a smaller quadratic Zeeman shift), 
our method extends to the full F = 1 or F = 2 manifold of °’Rb or 72Na, 
enabling a new kind of tuning for spinor BECs, without the losses 
associated with Feshbach tuning”. Such modifications may allow 
access to the expected non-abelian vortices in some F = 2 conden- 
sates*®. Because our SO coupling is in the small Q limit, this technique 
is practical for fermionic *°K, with its smaller fine-structure splitting 
and thus larger spontaneous emission rate’”. When the Fermi energy 
lies in the gap between the lower and upper bands (for example, 
Fig. 1b) there will be a single Fermi surface; this situation can induce 
p-wave coupling between fermions” and more recent work anticipates 
the appearance of Majorana fermions”. 


demonstrated that the differential interaction term c, 


METHODS SUMMARY 
System preparation. Our experiments began with nearly pure *’Rb BECs of 
approximately 1.8 X 10° atoms in the |F= 1, mp=~—1) state*? confined in a 


crossed optical dipole trap. The trap consisted of a pair of 1,064-nm laser beams 
propagating along %— j (1/e’ radii of wz 4.;120 um and w;~50 pum) and —%—j 
(1/e* radii of We 9~wz~65 jum). 

We prepared equal mixtures of |F = 1, mp = — 1) and |1, 0) using an initially off- 
resonant radio-frequency magnetic field B,;(t)x. We adiabatically ramped 6 to 
6~0 in 15ms, decreased the radio-frequency coupling strength Q,¢ to about 
150Hz, which is much less than /i,, in 6ms, and suddenly turned off 2,5, 
projecting the BEC into an equal superposition of |m;= —1) and |mp= 0). We 
subsequently ramped 6 to its desired value in 6 ms and then linearly increased the 
intensity of the Raman lasers from zero to the final coupling Q in 70 ms. 
Magnetic fields. Three pairs of Helmholtz coils, orthogonally aligned along «+7, 
x—y and Z, provided bias fields (B,+), By—,, and B.). By monitoring the |F = 1, 
Mp = —1) and |1, 0) populations in a nominally resonant radio-frequency dressed 
state, prepared as above, we observed a short-time (less than about 10 min) root- 
mean-square field stability 1, Brus /h <80 Hz. The field drifted slowly on longer 
timescales (but changed abruptly when unwary colleagues entered through our 
laboratory’s ferromagnetic doors). We compensated for the drift by tracking the 
radio-frequency and Raman resonance conditions. 

The small energy scales involved in the experiment meant that it was crucial to 
minimize magnetic field gradients. We detected stray gradients by monitoring the 
spatial distribution of |mp = —1)-|mp = 0) spin mixtures after TOF. Small magnetic 
field gradients caused this otherwise miscible mixture to phase-separate along the 
direction of the gradient. We cancelled the gradients in the x — y plane with two pairs 

1 


of anti-Helmholtz coils, aligned along x +7 and X—J, to gupB’/h <0.7Hzum . 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 
System preparation. Our experiments began with nearly pure *’Rb BECs of 
approximately 1.8 X 10° atoms in the |F= 1, mp=~—1) state*® confined in a 


crossed optical dipole trap. The trap consisted of a pair of 1,064-nm laser beams 
propagating along *— ji (1/e” radii of wz 45120 pm and w;~50 pum) and —x—7 
(1/e” radii of Wz—9~w:~65 jm). 

We prepared equal mixtures of |F = 1, mp = — 1) and |1, 0) using an initially off- 
resonant radio-frequency magnetic field B,;(t)x. We adiabatically ramped 6 to 
6~0 in 15 ms, decreased the radio-frequency coupling strength Q,¢ to about 
150 Hz, which is much less than /iw,, in 6ms, and suddenly turned off Q,5 
projecting the BEC into an equal superposition of |mp = —1) and |mp= 0). We 
subsequently ramped 6 to its desired value in 6 ms and then linearly increased the 
intensity of the Raman lasers from zero to the final coupling Q in 70 ms. 
Magnetic fields. Three pairs of Helmholtz coils, orthogonally aligned along x+y, 
x—y and Z, provided bias fields (B,..,, B,—,, and B,). By monitoring the |F = 1, 
Mp = —1) and |1, 0) populations in a nominally resonant radio-frequency dressed 
state, prepared as above, we observed a short-time (less than about 10 min) root- 
mean-square field stability gp Brus /h <80 Hz. The field drifted slowly on longer 
timescales (but changed abruptly when unwary colleagues entered through our 
laboratory’s ferromagnetic doors). We compensated for the drift by tracking the 
radio-frequency and Raman resonance conditions. 

The small energy scales involved in the experiment meant that it was crucial to 

minimize magnetic field gradients. We detected stray gradients by monitoring the 
spatial distribution of |mp = —1)-|mp = 0) spin mixtures after TOF. Small mag- 
netic field gradients caused this otherwise miscible mixture to phase-separate 
along the direction of the gradient. We cancelled the gradients in the x—y plane 
with two pairs of anti-Helmholtz coils, aligned along *+j and x—jy, to 
gipB' /h $0.7 Azam" '. 
SO-coupled Hamiltonian. Our system” consisted of a F= 1 BEC with a bias 
magnetic field along at the intersection of two Raman laser beams propagating 
along x+y and —*+¥ with angular frequencies w, and w, + Awy, respectively. 
The rank-1 tensor light shift of these beams produced an effective Zeeman mag- 
netic field along the z direction with Hamiltonian A= QRG,,c08(2k, x + A@ 1), 
where 63x, are the 3 X 3 Pauli matrices and we define 1; as the 3 X3 identity 
matrix. If we take y as the natural quantization axis (by expressing the Pauli 
matrices in a rotated basis 63 63,2, 63,63, and 63,,>63,,) and make the 
rotating wave approximation, the Hamiltonian for spin states {|mp= +1), |0), 
|—1)} in the frame rotating at Aa, is: 


36/2+ha, 0 0 
6/2 0 | + 


m 
—6/2 (2) 


0 0 
Q Q 
> 83,xc08( 2k.) = “> &ysin (2k. 2) 


As we justify below, |mp = +1) can be neglected for large enough /ia2,, which gives 
the effective two-level Hamiltonian: 
= Wis Os Q 
H,= Oe i+ : G,+ = G,cos(2k,x) — 7 Gysin(2k,x) 
for the pseudo-spins ||) = |mp= 0) and ||) =|—1) where Q=Qx/,/2. After a 
local pseudo-spin rotation by 0() =2k,% about the pseudo-spin Z axis followed 
by a global pseudo-spin rotation 6,>6,, G,>6, and 6,—>6,, the 2x2 
Hamiltonian takes the SO-coupled form: 
x fe. Gy ,8 I kyk 
H. 1 G,+ —Gy+2 
ap Bn 


6 +E 1 


The SO term linear in k, results from the non-commutation of the spatially 
dependent rotation about the pseudo-spin z axis and the kinetic energy. 
Effective two-level system. For atoms in |mp = —1) and |mp = 0) with velocities 
hk,./m~0 and Raman-coupled near resonance, 6 ~ 0, the |mp = +1) state is 
detuned from resonance owing to the fim, = 3.8E, quadratic Zeeman shift. For 
6/4E.<1 and Q < 4E;, we have A(Q, 5) ~ 6[1 — (Q/4E,)7]"?. 

Effect of the neglected state. In our experiment, we focused on the two-level 
system formed by the |mp = —1) and |m, = 0) states. We verified the validity of 
this assumption by adiabatically eliminating the |mp = +1) state from the full 
three-level problem. To second-order in Q, this procedure modifies the detuning 
6 and SO-coupling strength « in equation (1) by: 


52 (2 ed - 
2 4E, +hog 32 Ey 
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In these expressions, we have retained only the largest term in a 1/@, expansion. In 
our experiment, where /im, = 3.8E,, 6 is substantially changed at our largest coup- 
ling Q = 7E,. To maintain the desired detuning 6 in the simple two-level model (that 
is, A ~ 6 + 6 = 0 in Fig, 1c), we changed g1tyBy by as much as 3E,, to compensate 
for 5. We did not correct for the change to a, which was always small. 
Although both terms are small at the Q = 0.2E, transition from miscible to 
immiscible, slow drifts in By prompted us to locate 4 = 0 empirically from the 
equal-population condition, Ny; = Ny)’. Asa result, 6 in equation (1) implicitly 
includes the perturbative correction 5”. 
Origin of the effective interaction term. The additional c’+; term in the inter- 
action Hamiltonian for dressed spins directly results from transforming into the 
basis of dressed spins, which are: 


NW K)= TR = Ketan tht) 8 


JR = Kx +44 — kt) 


and 


Kx) 


LR =Ke tq) — ky) —2| 1K = Ke tq +h) (3) 
where /K,/m is the group velocity, K, = q — q; for |}’) and K, = q — q, for ||"), 
and ¢ = Q/8E,<1. Thus, in second quantized notation, the dressed field opera- 
tors transform according to: 


Wy (rade) tee (7) 


and 
I) =dy) tee hy(0) 


where qy~—V1—4ek,p~—k, and q)~V1—4e?k, ~k,. Inserting the trans- 
formed operators into: 


1 C2 % x AD fs « x * 


gives the interaction Hamiltonian (with normal ordering implied) for dressed 
spins which can be understood order-by-order (both c3/cg and ¢ are treated as 
small parameters). In this analysis, the terms proportional to c) are unchanged to 
the order of c/co, and we only need to evaluate the transformation of the spin- 
independent term (proportional to co). At O(2) and O(¢?) all the terms in the 
expansion include the high-spatial-frequency prefactors e+7"* or e*4**, For 
density distributions that vary slowly on the 1/2 length scale these average to zero. 
The O(<?) term, however, has terms without these modulations, and is: 


a (¢2 1 aes 
BA) = 5 [ar (sever 0 yy) 


giving rise to c's, =coQ? / (SEZ). 

Mean field phase diagram. We compute the mean-field phase diagram for a 
ground-state BEC composed of a mixture of dressed spins in an infinite homo- 
geneous system. This applies to our atoms in a harmonic trap in the limit of R>¢,, 
where Ris the system size, €, = /h" /2m | ater | nis the spin healing length and 
nis the density. We first minimize the interaction energy Hy at fixed Ny)’, with an 
effective interaction c's) as a function of Q. The two dressed spins are either phase- 
mixed, both fully occupying the system’s volume V, or phase-separated with a 
fixed total volume constraint V= V; + V\-. For the phase-separated case, min- 
imizing the free energy gives the volumes Vj and V’, determined by Nj, and V. 
The interaction energy of a phase-mixed state is smaller than that of a phase- 
separated state for the miscibility condition cy + cz +c’t, / 2< /co( +¢2), cor- 
responding to Q < Q.. This condition is independent of N;\/: for any Ny), the 
system is miscible at Q < Q.. Then, at a given Q, we minimize the sum of the 
interaction energy and the single-particle energy from the Raman detuning, 
(N; — N,/)6/2, allowing N;,,), to vary. For the miscible case (Q < Q,), the BEC 
is a mixture with fraction Ny /(Ny +Nj)e(0,1) only in the range of detuning 
d5€(59 —Ws.d0+Ws), where d9=cn/2, W5=|do\(1- @/Q)'? and 
n=(N;+N,)/V. For the immiscible case (Q > Q.), W5 = (c/8co)c2n is neg- 
ligibly small compared to can. 

Figure 2b shows the mean field phase diagram as a function of (Q, 5), where 6/E, 
is displayed with a quasi-logarithmic scaling, using the sign function sgn(d/ 
E,){logio(|6/E| + |b min/Ex|) — logio|Omin/E;|], in order to display 6 within the 
range of interest. This scaling function smoothly evolves from logarithmic, that 
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is, approximately sgn(6/E,)logyo|6/E;| for |5|>>dmin, to linear, that is, approxi- 
mately 6 for |5|<6min, where Omin/E_ = 0.001E, = 1.5 Hz. 

In our measurement of the dressed spin fraction f|, (see Fig. 3a), 6 =0 is 

determined from the Ny;=Ny, condition. We identify this condition as 
6 = 6 and apply it for all hold times fy). Because |6o| ~ 3 Hz is below our appro- 
ximately 80-Hz root-mean-square field noise, we are unable to distinguish 59 
from 0. 
Recombining TOF images of dressed spins. To probe the dressed spin states 
(equation (3)), each of which is a spin and momentum superposition, we adiabatically 
mapped them into bare spins, |? =4; +k.) and ||.& , =q, —kz), respectively. 
Then, in each image outside an ~90-|1m radius disk containing the condensate for 
each spin distribution, we fitted ny;’;r\(x,y) to a gaussian modelling the thermal 
background and subtracted that fit from m7 ;7)/(x, y) to obtain the condensate two- 
dimensional density ny’, (x, y). Thus, for each dressed spin we readily obtained the 
temperature, total number N;;,;r;/, and condensate densities nj, (x,y). 


To analyse the miscibility from the TOF images where a Stern-Gerlach gradient 
separated individual spin states, we re-centred the distributions to obtain 1;/(x, y) 
and n'(x, y). This took into account the displacement due to the Stern—Gerlach 
gradient and the non-zero velocities Hk , /m of each spin state (after the adiabatic 
mapping). The two origins were determined in the following way: we loaded the 
dressed states at a desired coupling 2 but with detuning 6 chosen to put all atoms 
in either ||’) or |}’). Because gy.) = + (1 —Q? /32E?) ky (see Fig. 1c), these velo- 
cities hk. /m=h(qr +k.) /m, h(q, —k:) /m depend slightly on Q, and our tech- 
nique to determine the origin of the distributions accounts for this effect. 
Calibration of Raman coupling. Both Raman lasers were derived from the same 
Ti:sapphire laser at 2 ~ 804.1 nm, and were offset from each other by a pair of 
acousto-optic modulators driven by two phase-locked frequency synthesizers near 
80 MHz. We calibrated the Raman coupling strength Q by fitting the three-level 
Rabi oscillations between the mp = —1, 0 and + 1 states driven by the Raman 
coupling to the expected behaviour. 
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Quantum Metropolis sampling 


K. Temme!, T. J. Osborne’, K. G. Vollbrecht*, D. Poulin* & F. Verstraete! 


The original motivation to build a quantum computer came from 
Feynman’, who imagined a machine capable of simulating generic 
quantum mechanical systems—a task that is believed to be intract- 
able for classical computers. Such a machine could have far- 
reaching applications in the simulation of many-body quantum 
physics in condensed-matter, chemical and high-energy systems. 
Part of Feynman’s challenge was met by Lloyd’, who showed how to 
approximately decompose the time evolution operator of interact- 
ing quantum particles into a short sequence of elementary gates, 
suitable for operation on a quantum computer. However, this left 
open the problem of how to simulate the equilibrium and static 
properties of quantum systems. This requires the preparation of 
ground and Gibbs states on a quantum computer. For classical 
systems, this problem is solved by the ubiquitous Metropolis algo- 
rithm’, a method that has basically acquired a monopoly on the 
simulation of interacting particles. Here we demonstrate how to 
implement a quantum version of the Metropolis algorithm. This 
algorithm permits sampling directly from the eigenstates of the 
Hamiltonian, and thus evades the sign problem present in classical 
simulations. A small-scale implementation of this algorithm 
should be achievable with today’s technology. 

Since the early days of quantum mechanics, it has been clear that 
there is a fundamental difficulty in studying many-body quantum 
systems: the configuration space, or Hilbert space, of a collection of 
particles grows exponentially with the number of particles. Many of 
the important breakthroughs in quantum physics during the twentieth 
century resulted from efforts to address this problem, leading to fun- 
damental theoretical and numerical methods to approximate solutions 
of the many-body Schrédinger equation. However, most of these 
methods are limited to weakly interacting particles; unfortunately, it 
is precisely when the interactions are strong that the most interesting 
physics arises. Notable examples include high-transition-temperature 
superconductors, electronic structure in large molecules and quark 
confinement in quantum chromodynamics. 

This problem with configuration space is not unique to quantum 
mechanics: the task of simulating interacting classical particles is chal- 
lenging for the same reason. It was only with the advent of computers 
in the 1950s that a systematic way of simulating classical many-body 
systems was made possible. In their seminal paper’, Metropolis et al. 
devised a general method to calculate the properties of any substance 
comprising individual molecules with classical statistics. This paper is 
a cornerstone in the simulation of interacting systems and has had a 
huge influence on a wide variety of fields (see, for example, refs 4-6). 
The Metropolis method can also be used to simulate certain quantum 
systems by means of a ‘quantum-to-classical map”. Unfortunately, 
this quantum Monte Carlo method is only scalable when the mapping 
conserves the positivity of the statistical weights, and fails in the case of 
fermionic systems as a result of the infamous sign problem’. 

As the reality of quantum computers comes closer, it is crucial to 
revisit the original motivation of Feynman for building a quantum 
simulator and to develop a general method, suitable for quantum 
computing machines, to calculate the properties of any substance 
comprising interacting quantum molecules. Such an algorithm would 


have a multitude of applications. In quantum chemistry, it could be 
used to compute the electronic binding energy as a function of the 
coordinates of the nuclei, thus solving the central problem of interest. 
In condensed-matter physics, it could be used to characterize the phase 
diagram of the Hubbard model as a function of filling factor, inter- 
action strength and temperature. Finally, it could conceivably be used 
to predict the mass of elementary particles, solving a central problem in 
high-energy physics. 

The seminal work of Lloyd’ demonstrated that a quantum computer 
can reproduce the dynamical evolution of any quantum many-body 
system. It did not address, however, the crucial problem of initial con- 
ditions: how to prepare the quantum computer efficiently in a state of 
physical interest such as a thermal (Gibbs) or ground state. Ground 
states could in principle be prepared using the quantum phase estima- 
tion algorithm®”, but this method is in general not scalable, because it 
requires a variational state with a large overlap with the ground state. 
Methods are known for systems with frustration-free interactions’® and 
systems that are adiabatically connected to trivial Hamiltonians"’, but 
such conditions are not generically satisfied. Suggestions have been 
made of how a quantum computer could sample from the thermal 
state of a system. One’ is related to the Metropolis rule but left open 
the problem of how to overcome the no-cloning result and construct 
local updates that can be rejected. This shortcoming immediately leads 
to an exponential running time of the algorithm”. A second’* approach 
to preparing thermal states is by simulating the system’s interaction 
with a heat bath. However, this procedure seems to produce large errors 
when run on a quantum computer with finite resources, and a precise 
framework to describe these errors seems to be out of reach. Moreover, 
certain systems such as polymers”’, binary mixtures“ and critical spin 
chains'*”* experience extremely slow relaxation when put into inter- 
action with a heat bath. The Metropolis dynamics solves this problem 
by allowing transformations that are not physically achievable, speed- 
ing up relaxation by many orders of magnitude and bridging the micro- 
scopic and relaxation timescales; this freedom is to a large extent 
responsible for the tremendous empirical success of the Metropolis 
method. 

In this Letter, we propose a direct quantum generalization of the 
classical Metropolis algorithm and show how one iteration of the 
algorithm can be implemented in polynomial time on a quantum 
computer. Our quantum algorithm is not affected by the sign problem 
and can be used to prepare ground and thermal states of generic 
quantum many-body systems, bosonic and fermionic. Like the classical 
Metropolis algorithm, the quantum Metropolis algorithm is not 
expected to reach the ground state of an arbitrary Hamiltonian in 
polynomial time. The ability to prepare the ground state of a general 
Hamiltonian in polynomial time would allow the solution of quantum 
Merlin Arthur (QMA)-complete problems’”’*, which is highly 
unlikely. However, for realistic physical systems, the convergence rate 
of the classical Metropolis algorithm is often very good, and it is con- 
ceivable that the same is also true for the quantum Metropolis algo- 
rithm. It also inherits all the flexibility and versatility of the classical 
method, leading, for instance, to a quantum generalization of simu- 
lated annealing’. 
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Figure 1 | Building blocks for the quantum algorithm. a, The first step of the 
quantum circuit: the input is an arbitrary state, |y), and two r-qubit registers 
initialized to |0)". Quantum phase estimation, ©, is applied to the state and the 
second register. The energy value in this register is then copied to the first 
register by a sequence of Controlled NOT gates. An inverse quantum phase 
estimation (©") is then applied to the state and the second register. b, The 
elementary step in the quantum circuit: the input is the eigenstate |;) with 
energy register | E;) and two registers initialized to |0)’ and |0), respectively. The 
unitary transformation C is then applied, followed by a quantum phase 
estimation step and the coherent Metropolis gate W. The state evolves as 


To set the stage for the quantum Metropolis algorithm, let us first 
recall the classical version. We can assume for definiteness that the 
system is composed of n two-level particles, that is, Ising spins. A lattice 
of 100 spins has 2) different configurations, so it is inconceivable to 
average them all. The key insight of Metropolis et al. was to set up a 
rapidly mixing Markov chain obeying detailed balance that samples 
from the configurations with the most significant probabilities. This 
can be achieved by randomly transforming an initial configuration to 
a new one (for example by flipping a randomly selected spin): if the 
energy of the new configuration, Eyew, is lower than the original, E,14, we 
retain the move, but if the energy is larger we retain the move only with 
probability exp(B(Eoia — Enew)), where f is the inverse temperature. 

The challenge we address is to set up a similar process in the 
quantum case, that is, to initiate an ergodic random walk on the 
eigenstates of a given quantum Hamiltonian with the appropriate 
Boltzmann weights. In analogy to a spin flip, the random walk can 
be realized by a random local unitary transformation, and the ‘move’ 
should be accepted or rejected following the Metropolis rule. There are, 
however, three obvious complications. First, we do not know what the 
eigenvectors of the Hamiltonian are (this is one of the problems that 
we want to solve). Second, certain operations, such as energy measure- 
ments, are fundamentally irreversible in quantum mechanics, but the 
Metropolis method requires rejecting, and hence undoing, certain 
transformations. Third, it is necessary to devise a criterion which 
proves that the fixed point of the quantum random walk is the 
Gibbs state. 


follows: |¥;)|B:)|0)|0) — C|y;)|Bi)|0)|0) = 37, 4|Wx)|Ei)|0)|0) > 

Se xf) 1B) |Bx)10) > Da xh yl) 1) IER) IL) + 

Ye Xe4/1—fil,) Es) |Ex) |0) with fi =min(1, exp(— f(E; — E;))). ¢, The 
binary measurement checks whether the energy of the state |) is the same as 
the energy of the original one, |/;). This is done by using an extra register 
containing phase estimation ancillas, a step that checks whether or not the 
energy is equal to E;, and finally an undoing of the phase estimation step that 
preserves coherence. 


To address the first obstacle, we assume for simplicity that the 
Hamiltonian has non-degenerate eigenvalues, E;, and denote the cor- 
responding eigenvectors |1/;). In the Supplementary Information, we 
show that those conditions are unnecessary. We can use the phase 
estimation algorithm*®’*”° to prepare a random energy eigenstate 
and measure the energy of a given eigenstate. Then each quantum 
Metropolis step (Fig. 1) takes as input an energy eigenstate |W;) with 
known energy E; and applies a random local unitary transformation C, 
creating the superposition C|;) = )_ xi.|W,). The transformation C 
could be a bit flip at a random location, as in the classical setting, or 
some other simple transformation. The phase estimation algorithm is 
then used in a coherent way, producing > ‘ xi.|W;,)|Ex), where |E,) is 
an extra register encoding the energy in binary format. At this point, 
we could measure the second register to read out the energy E, and 
accept or reject the move following the Metropolis prescription. 
However, such an energy measurement would involve an irreversible 
collapse of the wave function, making it impossible to return to the 
original configuration in the case of a reject step. 

Classically, we overcome this second obstacle by keeping a copy of 
the original configuration in the computer’s memory, allowing a 
rejected move to be easily undone. Unfortunately, this solution is ruled 
out in the quantum setting by the no-cloning theorem”. The key to the 
solution is to engineer a measurement that reveals as little information 
as possible about the new state, and therefore only slightly disturbs it. 
This can be achieved by a measurement that only reveals one bit of 
information—accept or reject the move—rather than a full energy 


lo 
loy 
ly) 
loy 
Ea: iS Ba m4 
—_— —_———" —_— —-——_ “ss ——" 
E Q P Q P Q 


Figure 2 | Quantum Metropolis stochastic map. The circuit corresponds toa 
single application of the map €. The first step, E, prepares an eigenstate of the 
Hamiltonian. The second step, Q, measures whether we want to accept or reject 
the proposed update. In the case of rejection, the complete quantum circuit 

comprises a sequence of measurements of the Hermitian projectors Q; and P;. 
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The recursion is aborted whenever the outcome P, is obtained, which indicates 
that we have returned to a state with the same energy as the input. Because each 
iteration has a constant success probability, the overall probability of obtaining 
the outcome P, approaches one exponentially as the number of iterations 
increases. 
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Figure 3 | Decision tree for unwinding the measurement. Given an input 
state, |), we first perform phase estimation to collapse to an eigenstate with 
known energy, E. This graph represents the plan of action conditioned on the 
different measurement outcomes of the binary P; and Q; measurements. Each 


measurement. The circuit that generates this binary measurement is 
shown at Fig. 1b. It transforms the initial state |/;) into 
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where fj = min(1, exp(— B(E; — E;))). The state can be seen as a coher- 
ent superposition of accepting the update or rejecting it. The ampli- 
xl fi 
of the classical Metropolis rule. The measurement is completed by 
measuring the last qubit in the computational basis. The outcome 
|1) will project the other registers in the state | w;r )- On obtaining this 
outcome, we can measure the second register to learn the new energy, 
E,, and use the resulting energy eigenstate as input to the next 
Metropolis step. 

A measurement outcome |0) signals that the We) must be rejected, 
so “ must return to the input state, |). As \v;*) is orthogonal to 
|v; ) , we actually work in a simple two-dimensional subspace, that is, 
a qubit. In such a case, it is possible to go back to the initial state by an 
iterative scheme similar to one previously used in the context of QMA 
amplification”. The circuit implementing this process is shown in 
Fig. 2. In essence, it repeatedly implements two binary measurements. 
The first is the one described in the previous paragraph. The second, 
after a basis change, determines whether or not the computer is in the 
eigenstate |;). A positive outcome to the latter measurement implies 
that we have returned to the input state, completing the rejection; in 
the case of a negative outcome, we repeat both measurements. Every 
sequence of these two measurements has a constant probability of 
achieving the rejection, so recursive repetition yields a success prob- 
ability exponentially close to one (Fig. 3). 

The quantum Metropolis algorithm can be used to generate a 
sequence of m states, |;)s j=l,...,m, that reproduce the statistical 
averages of the thermal state pg oce~ *" for any observable X: 


tudes x}, Vit correspond exactly to the transition probabilities, 


m 


~~ (41X|d;) =TeXp+O(1/ vim) 
j=l 


To show that the fixed point of the quantum random walk is the Gibbs 
state, we made use of the theoretical framework of ‘quantum detailed 
balance’ (Supplementary Information). Let {|/;)} be a complete basis of 
the physical Hilbert space and let {p;} be a probability distribution on this 
basis. Assume that a completely positive map, €, obeys the condition 


VPnPm WE Wn) Van Yj) = V/Pidj WmlE CW) Hil) hn) 


Then o = Lipi|W)(Wi| is a fixed point of €. The quantum detailed 
balance condition only ensures that the thermal state pg is a possible 
fixed point of the quantum Metropolis algorithm. The uniqueness of 


node in the graph corresponds to an intermediate state in the algorithm. One 
iteration of the map is completed when we reach one of the final leaves labelled 
either ‘Accept’ or ‘Reject’. The sequence E— Q, > L corresponds to accepting 
the update; all other leaves correspond to a rejection. 


this fixed point and the rate at which the algorithm converges to it 
depend on the choice of the set of random unitary transformations {C}. 
If the set of moves is chosen such that the map € is ergodic, the 
uniqueness of the fixed point is ensured. The Metropolis step obeys 
the quantum detailed balance condition if the probability of applying a 
specific transformation C is equal to the probability of applying its 
conjugate, C’. This can be seen as the quantum analogue of the classical 
symmetry condition for the update probability. In some cases, it even 
suffices to apply the same local unitary transformation at every step of 
the algorithm (Fig. 4). In this case, the single unitary transformation 
has to be Hermitian. The local unitary transformation can be seen to 
induce ‘non-local’ transitions between the eigenstates because it is fol- 
lowed by a phase estimation procedure. 

In conclusion, even though an implementation of this algorithm for 
full-scale quantum many-body problems may be out of reach with 
today’s technological means, the algorithm is scalable to system sizes 
that are interesting for actual physical simulations. In Supplementary 
Information, we describe a small-scale implementation of the algorithm 
that can be achieved with present-day technology. Moreover, a discus- 
sion is included that sketches the basic steps necessary for the simu- 
lation of some notoriously hard quantum many-body problems. Like in 
the classical setting, the convergence rate and, hence, the run-time of the 
algorithm are dictated by the spectral gap of the stochastic map. The 
scaling of the gap depends on the Hamiltonian in the problem and the 
choice of updates, {C}. Just as for the classical Metropolis algorithm, 
efficient thermalization is not expected for an arbitrary Hamiltonian. 
This would allow the solution of QMA-complete problems in poly- 
nomial time****. It is, however, expected that the algorithm will ther- 
malize for realistic physical systems. The inverse gap of the quantum 
Metropolis map for the XX chain in a transverse magnetic field at zero 
temperature with a simple, single spin-flip update is shown in Fig. 4. 
This plot indicates that the gap scales like O(1/N), where N is the 
number of spins, even at criticality. To prove a polynomial scaling of 
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Figure 4 | Inverse spectral gap of the completely positive map for the 
quantum Ising model. Inverse gap, 1/4, of the quantum Metropolis map at 
zero temperature as a function of the number of spins, N, in a chain with 
Hamiltonian H = S*, XiXk41+ Ye Ye+i1 +gZx. The update rule is a single spin 
flip, X,. The observed linear scaling indicates that, at least in the case of one- 
dimensional spin chains with nearest-neighbour Hamiltonians, the quantum 
Metropolis algorithm seems to converge in polynomial time. Proving this 
remains an interesting open problem. 
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the gap for more complex Hamiltonians remains a challenging open 
problem. Also, it is well known that the choice of updates, {C}, can have 
a drastic impact on the convergence rate of the Markov chain in the 
classical setting. Finding good updates in the quantum setting is a very 
interesting open question, although the above example suggests that the 
problem might be simpler in the quantum case than in the classical case. 
The algorithm can be seen as a classical random walk on the eigenstates 
of the Hamiltonian. All samples are thus computed with respect to the 
actual eigenstates. This is why our method is suitable for the simulation 
of fermionic systems by exploiting the Jordan-Wigner transforma- 
tion’®, as discussed in ref. 27. The fermionic sign problem is therefore 
not an issue for the quantum Metropolis algorithm. It is worth noting 
that an additional quadratic speed-up might be achievable using the 
methods of refs 28-30. 
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Synchronicity of Antarctic temperatures and local 
solar insolation on orbital timescales 


Thomas Laepple', Martin Werner’ & Gerrit Lohmann! 


The Milankovitch theory states that global climate variability on 
orbital timescales from tens to hundreds of thousands of years is 
dominated by the summer insolation at high northern latitudes’”. 
The supporting evidence includes reconstructed air temperatures 
in Antarctica that are nearly in phase with boreal summer insola- 
tion and out of phase with local summer insolation**. Antarctic 
climate is therefore thought to be driven by northern summer 
insolation®. A clear mechanism that links the two hemispheres 
on orbital timescales is, however, missing. We propose that key 
Antarctic temperature records derived from ice cores are biased 
towards austral winter because of a seasonal cycle in snow accu- 
mulation. Using present-day estimates of this bias in the ‘recorder’ 
system, here we show that the local insolation can explain the 
orbital component of the temperature record without having to 
invoke a link to the Northern Hemisphere. Therefore, the Antarctic 
ice-core-derived temperature record, one of the best-dated records 
of the late Pleistocene temperature evolution, cannot be used to 
support or contradict the Milankovitch hypothesis that global 
climate changes are driven by Northern Hemisphere summer 
insolation variations. 

Reconstructions of Antarctic local temperature based on the mea- 
sured ratio of stable water isotopes within deep ice cores show a strong 
signature of orbital insolation variability’ °. Recently, a new absolute 
dating technique*’enabled the phasing between the Antarctic temper- 
ature and the insolation to be determined and hence inferences could 
be made about the Milankovitch theory’. Because the tilt of the Earth, 
corresponding to the obliquity parameter, influences the high latitudes 
on both hemispheres in an identical way, the strong obliquity com- 
ponent found in the temperature reconstructions*° does not help to 
distinguish between a local or a remote insolation forcing. However, 
the precession component, which is out of phase between the Northern 
Hemisphere and the Southern Hemisphere, was found to be coherent 
and nearly in phase with the Northern Hemisphere summer insolation 
intensity (21 June insolation at 65° N)°’. This nearly in-phase relation- 
ship therefore puts the Antarctic climate response into the early group of 
responses according to the SPECMAP classifications” and would sup- 
port the Milankovitch theory’ that the global climate is driven by insola- 
tion changes at high northern latitudes°. However, these findings pose 
the question of how the Northern Hemisphere solar forcing is trans- 
ferred to the Southern Hemisphere, and why Southern Hemisphere 
local insolation changes have no imprint on the Antarctic temperature 
record. Variations in greenhouse gas concentrations are too weak to 
explain the interhemispheric link’; there exists no evidence that atmo- 
spheric dynamics can directly transfer the orbital signal to the Southern 
Hemisphere’, and changes in the thermohaline circulation are thought 
to favour an asymmetric pattern’®. 

To explain the observed phasing of the precession signal in the 
Antarctic temperature records, different possible mechanisms have 
been proposed: (1) When the summer solstice in one hemisphere 
occurs in perihelion (the Earth’s closest point to the Sun), the summer 
solstice on the other hemisphere occurs in aphelion (the Earth’s furthest 
point from the Sun). A comparison of the summer insolation of both 


hemispheres with the temperature record leads to the hypothesis that 
the Antarctic climate on orbital timescales is paced by northern summer 
insolation’. (2) The summer insolation intensity and summer duration 
are out of phase in the precessional band because by Kepler’s law a 
closer pass to the Sun (higher summer intensity) must be faster (shorter 
summer). It was recently proposed that Antarctic temperatures respond 
sensitively to changes of the local summer duration’. However, this 
nonlinear response on insolation cannot be fully confirmed by present- 
day observations (Supplementary Note 7). (3) Changes of Southern 
Ocean sea-ice coverage by variations in winter or spring insolation 
could affect the Antarctic temperature by modifying the heat transport 
from the Southern Ocean to the ice sheet and would be in phase with the 
isotopic record'*’* depending on the definition of the season™. In first 
climate-model simulation experiments this sea-ice effect on the annual 
mean temperature in Antarctica is minor’’. 

Here we propose an alternative mechanism, related to the interplay of 
a seasonal cycle in the accumulation of Antarctic snow (that is, the 
‘recording system’) with the seasonal variations of the precession com- 
ponent of the incoming solar insolation. It is based on the idea that the 
snow accumulation on the Antarctic Plateau has a minimum in austral 
summer, and therefore the recorded temperature signal is biased towards 
the remaining seasons. 

The measured temporal variations of the isotopic composition 
within Antarctic ice cores on glacial—interglacial timescales can be 
safely interpreted as accumulation-weighted temperature changes". 
Numerous authors have drawn attention to the possible influence of 
a change in the seasonal precipitation distribution during glacial- 
interglacial cycles on the isotopic record’*”*, and it has been shown 
that this effect significantly biases the Greenland temperature records 
but probably has only a minor effect on Antarctica'’. There are other 
biasing effects of the interplay of temperature and precipitation, 
namely the intermittent precipitation behaviour, which increases the 
interannual variability in the isotopic record’’, and the link of precipi- 
tation to specific weather patterns that are not representative for the 
mean temperature’’. However, not only changes in the seasonality of 
the precipitation, but also the complementary effect, that is, changes of 
the seasonal cycle of temperature together with a stable seasonality in 
precipitation, could influence the record”. 

For present-day climate, the surface temperature on the East- 
Antarctic Plateau is largely determined by the local insolation’. This 
is supported by the seasonal cycle in temperature, which strongly fol- 
lows the incoming radiation (Fig. 1). When we introduce a small time 
lag of 7-9 days (caused by the thermal capacity), the daily insolation 
explains 96-99% of the variance of the surface air-temperature cycle 
measured by the Automatic Weather Stations at Dome C”, Dome Fuji” 
and Vostok (Fig. 1). We can therefore approximate the local temper- 
ature by the insolation. Following this argument, the accumulation- 
weighted insolation is used in this study as proxy for the isotopically 
derived temperature record. 

Owing to low local precipitation and strong wind-drift, a direct 
measurement of seasonal accumulation on the Antarctic Plateau is 
difficult’. However, several different accumulation estimates for the 
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Figure 1 | Relationship of daily local insolation and surface air temperature. 
a, Dome Fuji, Automatic Weather Station data (February 1995 toNovember 
2006). b, Dome C, Automatic Weather Station data” (January 1984 to December 
1994).c, Vostok, monthly data from the Arctic and Antarctic Research Institute 
(AARI) (1958-2007). We removed the time lag between insolation and 
temperature (9 days in a, 7 days in b, no time lag in c because monthly data are 
used). We could not detect any systematic deviations from a linear relationship 
(especially no higher sensitivity to lower temperatures as proposed in a recent 
study'’, see Supplementary Note 7). This supports the concept of using the 
insolation as an approximation for the surface air temperature. 


East Antarctic Plateau (Methods Summary) show a strong similarity, 
with a minimum in accumulation in summer (Fig. 2a). It has been 
argued that the winter maxima in accumulation on the East Antarctic 
Plateau are caused by the gentle terrain slopes, which leave radiative 
cooling as the primary mechanism to maintain saturation and cause 
clear-sky precipitation, as well as by the increased moisture transport 
caused by the weather systems in the local winter****. Furthermore, 
summer ablation reduces the summer accumulation”. 

For glacial climates, at present, no direct measurements of the 
seasonality of accumulation on the East Antarctic Plateau exist. 
However, it seems safe to assume that the seasonality of precipitation 
has been rather stable over time because the present-day temperatures 
on the plateau are far below the freezing point and therefore any effect 
of the additional glacial cooling on precipitation and sublimation is 
probably limited. Furthermore, Antarctic boundary conditions, such 
as the circular symmetry and the topography, are only weakly affected 
by glacial—interglacial changes. For the remainder of this study we use 
the mean over the different data sets as a best guess for the seasonal 
cycle of the accumulation on the East Antarctic Plateau. However, the 
following results are also obtained if using any of the single time series 
(Supplementary Note 1). Folding the mean seasonal accumulation 
cycle with the insolation anomaly between the precession phases leads 
to a positive net anomaly of the accumulation-weighted insolation 
forcing (Fig. 2b, Methods Summary). This weighted insolation signal 
has the opposite phase to the precession component of the local summer 
anomaly and is in phase with Northern Hemisphere summer intensity. 

To quantitatively analyse this effect, we focus on the Dome Fuji 
temperature reconstruction (Tite, Fig. 3a) because this record is based 
on a absolute chronology’. The results are not sensitive to this choice 
and also apply for the other long temperature reconstructions on the 
East Antarctic plateau, Vostok and Dome C (Supplementary Note 5). 
We weight the daily insolation at the position of the core (77° S) (ref. 25) 
with the mean local accumulation displayed in Fig. 2. This accumula- 
tion-weighted insolation shows very similar temporal variations com- 
pared to the local temperature reconstruction from Dome Fuji in the 
orbital bands (Fig. 3b) (R= 0.70, Tite bandpass-filtered between 
1/15,000 years and 1/50,000 years). The main deviations from the 
data are during deglaciations, where our model cannot capture the 
strong amplitude of the Ti. variations. Furthermore, the local 
weighted insolation is nearly indistinguishable from the Northern 
Hemisphere summer insolation, which shows a similar relationship 
to the Dome Fuji temperature reconstruction (Fig. 3c, R = 0.70). In 
contrast to these two hypotheses, the local maximum summer insola- 
tion (Fig. 3d) cannot explain the data because the precession has the 
inverse phase to the observation (R= —0.37), and the local 
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Figure 2 | Seasonal cycle of accumulation and insolation anomaly. 

a, Monthly accumulation fractions are shown as coloured lines. Thin black line, 
Vostok precipitation from corrected gauge measurements (AARI) (1958- 
2007). Cyan line, Vostok snow-stake-based accumulation (1970-1995). Red 
line, P minus E reanalysis moisture budget from the European Centre for 
Medium-Range Weather Forecasts (ECMWF) for altitudes exceeding 2,500 m. 
Green line, precipitation estimate at Mizuho station (70.6° S, 44.3° E) (1980). 
Blue line, Pminus E rawinsonde-based moisture budget for the area 0-55° E, 
69.3-76.8° S. Orange line, constant precipitation plus 50% sublimation in NDJF 
(that is, November to February), following observations at Dome Fuji. The 
thick black line joining points is the average of all estimates. References to these 
data sets can be found in Supplementary Note 1. b, Unweighted (black) and 
accumulation-weighted (blue) difference in local diurnal insolation between 
the two extreme states of the precession (austral summer solstice at aphelion 
minus austral summer solstice at perihelion, for a fixed eccentricity value of 
0.05). The annual mean insolation anomaly is zero, but folding the anomaly 
with the accumulation estimates leads to a positive net response. The austral 
summer solstice (21 December) is marked as a vertical yellow dashed line. 


unweighted annual mean insolation (Fig. 3e) does not contain a 
precession component and therefore does not describe the proxy data 
very well (R = 0.34) either. 

A more detailed analysis of the temporal temperature and insolation 
changes reveals that the local temperature data slightly lags behind the 
weighted insolation (precession 26.8°, obliquity 52.7°). However, our 
time lag is approximately the same as the lag of local temperature to the 
northern summer insolation (precession 27.3°, obliquity 53.0°). The 
latter was found to be insignificant if the uncertainty of the chronology 
as well as the sampling uncertainty is taken into account”. 

Assuming the seasonal temperature sensitivity observed in the 
present-day seasonal cycle of temperature (Fig. 1, 0.067°C W 'm’), 
we can also compare the strength of isotope-based temperature changes 
with the ones obtained from our model. The modelled insolation-based 
temperature amplitude (around 0.7 °C peak-to-peak) is too low com- 
pared to the reconstructed temperature change in the orbital bands 
(3 °C peak-to-peak). However, the equilibrium temperature sensitivity 
is considerably higher than the seasonal temperature sensitivity”®. This 
will affect the temperature response on long-term insolation anomalies 
caused by changes in the Earth’s axial tilt. Our model is based com- 
pletely on modern observations, so other mechanisms not captured by 
our approach might increase the amplitude further. One example is 
the dependence of the seasonal accumulation on the summer insola- 
tion intensity, caused by the temperature dependence of the snow 
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Figure 3 | Comparison of the temperature reconstruction with the 
accumulation-weighted insolation. a, T,;,.at Dome Fuji’. b, T,;. filtered in the 
orbital band (bandpass-filtered between 1/15,000 years and 1/50,000 years, 
101,000-year window, Finite Response Filter) (red trace) compared with the 
accumulation-weighted insolation at 77° S (blue trace). c, Northern boreal 
summer insolation at 65° N (black trace) compared with the accumulation- 
weighted local insolation at 77° S (blue trace). d, Local unweighted austral 
summer insolation. e, Local annual mean insolation. f, MgCa-derived sea 


ablation. Sensitivity studies show that this additional effect could 
explain the full precession amplitude (Supplementary Note 3). 
Furthermore, the linearity between insolation and temperature within 
our conceptual model implies that the temperature of the polar winter 
remains constant during all precession phases. However, enhanced 
summer insolation intensity is accompanied by longer Antarctic winters, 
and by reduced winter insolation intensity in the Southern Hemi- 
sphere, in regions north of the Antarctic Circle. These effects might 
decrease Antarctic winter temperatures and thus lead to reduction of 
the accumulation-weighted annual temperature signal. This would amp- 
lify the precession signal that is in phase with Northern Hemisphere 
summer insolation (Supplementary Note 4). 

Given the results of our model, we propose that the orbital varia- 
tions in Antarctic local temperature are a response to the local insola- 
tion if the seasonal pattern of accumulation is correctly taken into 
account. This implies that the interhemispheric symmetry in polar 
climate change might not be due to a causal relationship between 
the hemispheres, but is simply an artefact of the recording system. 

Do these findings contradict the orbital period variability observed 
in sub-Antarctic marine records’? Local climate feedbacks can lead to 
a winter-to-spring sensitivity of the annual temperature and therefore 
to local precession signals in phase with Northern Hemisphere sum- 
mer insolation’*; one example is the spring sensitivity of sea ice’. 
Furthermore, modern sediment-trap data indicate highly seasonal 
patterns of foraminifera fluxes at Chatham rise, one of the key loca- 
tions for sub-Antarctic marine records. Here, the foraminifera species 
used for the palaeotemperature estimates mainly peak in the austral 
spring’””’. This seasonal recording leads to a precession signal resem- 
bling Northern Hemisphere summer insolation (Fig. 3f, Supplemen- 
tary Note 6). Additionally, some parts of the marine sediment 
chronology are not independently dated but are based on Antarctic 
ice-core chronologies. Thus, it seems likely that the precession signal of 
the south polar regions is a combination of different mechanisms, 
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surface temperature of sediment core MD97-2120 (ref. 27), filtered in the 
orbital band (orange trace), compared with the local insolation weighted by the 
seasonal cycle of local foraminiferal flux” (blue trace). See also Supplementary 
Note 6. The local accumulation-weighted insolation, which is a proxy for the 
isotopic record, is coherent and nearly in phase with T;i,- (b) and very similar to 
the Northern Hemisphere summer intensity (c). Therefore T,,. cannot be used 
to support the remote forcing hypothesis. 


and our findings demand a careful re-evaluation of these Southern 
Hemisphere climate records and recorder systems. 

The local interpretation of the precession and obliquity signal also 
provides insight into the stronger quasi-100,000-year cycle by decoup- 
ling the orbital variability in both hemispheres. In contrast to the view- 
point that Antarctic temperature is driven by Northern Hemispheric 
summer insolation, the hypothesis presented is consistent with termi- 
nations triggered by either the Northern or the Southern Hemi- 
sphere’?, or as a combination of both*®. In such a scenario, the 
insolation-sensitive sub-Antarctic sea ice'’, the Southern Ocean as a 
potential driver for CO) (ref. 31) and the insolation-sensitive icesheet 
of the Northern Hemisphere’ might act together. A Southern Hemi- 
sphere influence could also explain the phasing of the circulation res- 
ponse relative to insolation forcing that contradicts the SPECMAP 
hypothesis”. 


METHODS SUMMARY 


Studies of seasonal accumulation on the East Antarctic Plateau are sparse and we 
therefore also considered the three accumulation-related quantities precipitation 
P, net precipitation (P minus evaporation E) and sublimation for our analysis. The 
analysed records that include gauge measurements of precipitation, accumulation 
estimates from stake networks, and moisture flux calculations are described in 
Supplementary Note 1. 

To derive the weighted annual mean insolation, in 100-year steps, the daily 
insolation is weighted with the accumulation estimate, linearly interpolated to 
daily values: 


J woman 


Wh = = ——_— 
weight Jawan 
r 


Weight is the annual weighted insolation, t is the day of the year, W(t) is the daily 
insolation, A(t) is the daily accumulation estimate, and T corresponds to one year. 

To define a past calendar, a reference date must be arbitrarily chosen in which a 
certain day is aligned to a position of the Earth on the ellipse around the Sun"**. A 
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common solution is to fix 21 March to the vernal equinox for any period of the 
past. But because we are studying the local Antarctic insolation, we use a different 
calendar approach and choose to fix the austral solstice to 21 December. Our 
results are insensitive to this choice. Furthermore, using a method completely 
independent of any calendar definitions gives similar results (Supplementary 
Note 2). 
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Phylogenomic analyses unravel annelid evolution 


Torsten H. Struck!, Christiane Paul’, Natascha Hill’, Stefanie Hartmann’, Christoph Hésel!, Michael Kube*, Bernhard Lieb”, 
Achim Meyer®, Ralph Tiedemann’, Giinter Purschke! & Christoph Bleidorn”® 


Annelida, the ringed worms, is a highly diverse animal phylum that 
includes more than 15,000 described species and constitutes the 
dominant benthic macrofauna from the intertidal zone down to 
the deep sea. A robust annelid phylogeny would shape our under- 
standing of animal body-plan evolution and shed light on the 
bilaterian ground pattern. Traditionally, Annelida has been split 
into two major groups: Clitellata (earthworms and leeches) and 
polychaetes (bristle worms), but recent evidence suggests that other 
taxa that were once considered to be separate phyla (Sipuncula, 
Echiura and Siboglinidae (also known as Pogonophora)) should 
be included in Annelida’*. However, the deep-level evolutionary 
relationships of Annelida are still poorly understood, and a robust 
reconstruction of annelid evolutionary history is needed. Here we 
show that phylogenomic analyses of 34 annelid taxa, using 47,953 
amino acid positions, recovered a well-supported phylogeny with 
strong support for major splits. Our results recover chaetopterids, 
myzostomids and sipunculids in the basal part of the tree, although 
the position of Myzostomida remains uncertain owing to its long 
branch. The remaining taxa are split into two clades: Errantia 
(which includes the model annelid Platynereis), and Sedentaria 
(which includes Clitellata). Ancestral character trait reconstruc- 
tions indicate that these clades show adaptation to either an errant 
or a sedentary lifestyle, with alteration of accompanying morpho- 
logical traits such as peristaltic movement, parapodia and sensory 
perception. Finally, life history characters in Annelida seem to be 
phylogenetically informative. 

Annelids are found throughout the world’s terrestrial, aquatic and 
marine habitats. They represent one of three major animal groups with 
segmentation, so understanding annelid body-plan evolution is crucial 
for elucidating aspects of the evolution of Bilateria*’. Several annelid 
taxa have recently emerged as model organisms in various biological 
disciplines*. Surprisingly, the evolution of Annelida is still poorly under- 
stood, and it is uncertain how well these model organisms represent the 
ancestral character traits in Annelida. To rectify this situation, multi- 
gene data sets are needed to evaluate the diversity and the relationships 
of major annelid clades. 

Annelida traditionally included Polychaeta and Clitellata. Mor- 
phological and molecular data corroborate clitellate monophyly and 
provide robust phylogenetic hypotheses within this taxon’. Polychaetes 
are classified into approximately 80 family-level taxa that are generally 
supported as monophyletic; however, arrangement of these taxa into 
well-supported, more-inclusive nodes is problematic’. Historically, 
polychaetes were classified as either Sedentaria or Errantia on the basis 
of their morphology and mode of life’’*. This systematization was 
dismissed in the 1970s as being arbitrary groupings useful only for 
practical purposes. About 15 years ago, on the basis of morphological 
cladistic analyses, a monophyletic Polychaeta consisting of two major 
clades, Scolecida and Palpata, was proposed, with the latter clade 
divided into Canalipalpata and Aciculata'®. There is increasing mol- 
ecular evidence, however, that places Clitellata, as well as the non- 
segmented taxa Echiura and Sipuncula, within polychaetes and thus 


renders Polychaeta paraphyletic'*. So far, molecular work based on 
only a few genes has not supported the proposed monophyly’® of 
most major polychaete clades. Yet, support for basal nodes in these 
studies is less than 50 or 0.50 for bootstrap support (BS) or posterior 
probability (PP), respectively, resulting in a lack of support for alterna- 
tive hypotheses””. 

To address these major outstanding issues of annelid phylogeny, we 
used a phylogenomic approach, generating expressed sequence tag 
(EST) libraries for 17 annelid taxa, which are in addition to the publicly 
available EST or genomic data from annelids. We reconstructed rela- 
tionships of major annelid taxa using 47,953 amino acid positions 
derived from 231 gene fragments that span 20 traditional polychaete 
‘families’, Siboglinidae, Myzostomida, Echiura, Clitellata, Sipuncula 
and five outgroup taxa. This is the largest phylogenomic data set 
explored so far in annelid phylogeny and has a mean data coverage 
of 41.7% per taxon. 

Sensitivity analyses of our data (Supplementary Tables 4 and 6) 
showed that increasing the number of positions and mean leaf stability 
had a positive impact on BS, whereas increasing the data coverage by 
removing either genes or taxa with low coverage had no such impact 
(Supplementary Fig. 1). Therefore, the largest data set (47,953 posi- 
tions), with either all taxa (denoted ALL) or excluding the five annelid 
taxa that showed leaf stabilities below 0.925 (denoted EX) was used in 
maximum likelihood and Bayesian inference analyses. These analyses 
retrieved a clade (called clade 1) comprising all annelids with the 
exception of Chaetopteridae, Sipuncula and Myzostomida. This clade 
received significant branch support: ALL, PP = 0.98 (Bayesian infer- 
ence), BS = 88 (maximum likelihood); EX, PP = 0.99 (Bayesian infer- 
ence), BS = 100 (maximum likelihood) (Fig. 1 and Supplementary Figs 
2,3, 6 and 8). Reconstructing ancestral morphological traits for clade 1 
and Annelida revealed that they were similar, except for some larval 
characters (Fig. 2a and Supplementary Table 5). 

On the basis of this reconstruction, the ancestral annelid had a pair of 
anterior appendages (that is, grooved palps), which functioned in food 
gathering and sensory perception. Other head or pygidial appendages 
were absent. Eyes and nuchal organs were present as sensory organs. Of 
the different chaetal types, only internalized supporting chaetae and 
simple chaetae were part of the ancestral pattern. Reconstructions of 
most other parapodial characters were uncertain, except for the 
possession of prominent notopodial lobes. Although the fossil record 
of early annelids from the Cambrian period is sparse, it nonetheless 
reveals that, congruent with our reconstructions, the early annelids 
had palps, simple chaetae and internalized supporting chaetae but 
did not have other chaetae or appendages such as tentacular, para- 
podial or pygidial cirri'®””. 

In agreement with previous molecular studies’ *'*, Chaetopteridae, 
which have three distinct body regions, are found in the basal part of 
the annelid tree. Thus, the evolution of segmentation—with predomi- 
nantly homonomous segmentation in clade 1 and Myzostomida, 
heteronomous segmentation in Chaetopteridae and complete reduc- 
tion in Sipuncula—is already highly variable at basal nodes in the 
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Figure 1 | Reconstruction of the Annelida phylogenetic tree. Majority rule 
consensus trees of the Bayesian inference analysis using the site-heterogeneous 
CAT model of the data set with 39 taxa and 47,953 amino acid positions. Only 
PP (top of branch or alone) and BS (bottom) values = 0.70 or 70, respectively, 
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Figure 2 | Ancestral reconstructions of body and parapodial characters. 

a, Annelida and clade 1. b, Errantia. c, Sedentaria. Body characters (left) and 
parapodial characters (right) are depicted. The state of several parapodial 
characters in Annelida and clade 1 is uncertain, so we depict the two most 
extreme possibilities. Dashed lines or question marks indicate that the state of 
the character is uncertain. bie, bicellular eyes; doc, dorsal cirrus; grp, grooved 
palps; isc, internalized supporting chaetae; laa, lateral antenna; mue, 
multicellular eyes; nuo, nuchal organ; pyc, pygidial cirrus; sic, simple chaetae; 
sop, solid palps; un/h, uncini/hooks; vec, ventral cirrus. 
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Bugula neritina (Ectoprocta) 


are shown. The branch leading to Myzostomida is reduced by 75%. Annelida 
are highlighted in red, with Sedentaria in blue and Errantia in green. Grey bars 
indicate additional annelid groups. *, BS value for the monophyly of Annelida 
without Myzostomida in the maximum likelihood analysis is 99. 


annelid phylogeny’. Moreover, we acknowledge that, in addition to 
Chaetopteridae, Myzostomida and Sipuncula, other taxa such as 
Oweniidae, Dinophilidae or Protodrilida, which were not covered here 
because of a lack of data, might also be placed in the basal part of the 
annelid tree’. 

The major difference between the maximum likelihood and 
Bayesian inference analyses is the placement of Myzostomida. Myzo- 
stomids are either ectocommensals or endoparasites of echinoderms, 
and the systematic placement of this aberrant taxon has proved to be 
problematic’’’®. Bayesian inference analysis places Myzostomida 
within Annelida (PP = 0.99 for both data sets (ALL and EX); Fig. 1 
and Supplementary Fig. 6). By contrast, by maximum likelihood ana- 
lyses, long-branched Myzostomida are grouped with Ectoprocta, the 
outgroup taxon with the longest branch (Supplementary Figs 2 and 3). 
There is conclusive support from mitochondrial gene order and mor- 
phological data that Myzostomida are part of the annelid radiation”, 
and it has been shown that their derived sequences can be affected by 
long-branch attraction (LBA)'*. The CAT model of Bayesian inference 
analyses is known to be less affected by LBA than other models, and 
this model proved to be better suited for our data set than was the LG 
model of maximum likelihood analyses (Supplementary Information). 
Notwithstanding the different position of Myzostomida (possibly 
owing to LBA), both maximum likelihood analyses support the mono- 
phyly of Annelida: ALL, BS = 99; EX, BS = 100 (Supplementary Figs 2 
and 3). Moreover, the exclusion of Myzostomida did not substantially 
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alter the phylogenetic reconstruction of annelid ingroup relationships 
and BS values (Supplementary Fig. 7). Finally, the different placement 
of Myzostomida in the Bayesian inference and maximum likelihood 
analyses did not affect the reconstructions of ancestral morphological 
traits (Supplementary Table 5). 

Clade 1 split into two well-supported clades: Errantia, which com- 
prised Phyllodocida, Eunicida, Amphinomida and Orbiniidae; and 
Sedentaria, which comprised Clitellata and Echiura, as well as most 
other Scolecida (Capitellidae, Opheliidae and Arenicolidae) and Canali- 
palpata (Terebelliformia, Cirratuliformia, Siboglinidae, Serpulidae and 
Spionidae). Both clades were significantly supported: ALL, PP = 0.99 
(Bayesian inference), BS = 79 (maximum likelihood); EX, PP = 0.99 
(Bayesian inference), BS = 100 (maximum likelihood) (Fig. 1 and 
Supplementary Figs 2, 3, 6 and 8). The placement of Clitellata indicated 
a closer relationship to Terebelliformia/Arenicolidae, Opheliidae and 
Capitellidae/Echiura. Moreover, analyses of branch attachment fre- 
quencies based on the data set comprising all taxa showed that each of 
the five removed annelid taxa is nested in either Sedentaria (Ridgeia, 
Ophelia, Pomatoceros and Malacoceros) or Errantia (Eurythoe), and 
none is moving between clades (Supplementary Figs 4 and 5). 

In an influential publication in the 1990s, the two main competing 
hypotheses of annelid evolution were discussed’: one, starting with a 
ground pattern that resembles an errant, epibenthic organism; and, the 
other, starting with an infaunal burrowing form. Interestingly, we found 
both trends to be realized within annelids. The ground pattern of Errantia 
reveals some important changes with respect to sensory perception and 
motility. On the basis of our reconstructions, the last common ancestor 
of Errantia had lateral antennae, palps (which are solid and restricted to 
sensory perception), a pair of pygidial cirri, nuchal commissures and two 
pairs of multicellular eyes facing in different directions” (Fig. 2b and 
Supplementary Table 5). The parapodia had prominent notopodial and 
neuropodial lobes supported by internalized chaetae, as well as ventral 
cirri. Overall, this pattern can be regarded as adaptations to a more 
active and mobile lifestyle, which requires increased perception of the 
environment, as well as motility by undulation. Prominent parapodial 
lobes are advantageous for rapid movements based on undulation, 
which is mainly achieved by the well-developed longitudinal muscu- 
lature arranged in at least four separate bundles. For example, in sexu- 
ally mature (that is, epigamous) nereidids, adopting a temporary 
pelagic reproductive stage, parapodial lobes are even further enlarged 
and paddle-like than in immature stages'’. Most taxa of Phyllodocida, 
Eunicida and Amphinomida show such an errant, often predatory, 
mode of life and hence were traditionally named Errantia’’. The posi- 
tion of Orbiniidae, which were previously grouped with Scolecida’, 
might be surprising; however, placing them within or close to the 
errant forms had previously been debated on the basis of morpho- 
logical and molecular evidence**"’. Therefore, we named this clade 
Errantia, as it is characterized by adaptation to a more errant life. 

The evolution of parapodia in Sedentaria shows the opposite trend. 
Neuropodial and notopodial lobes are generally smaller than in 
Errantia and lack internalized supporting chaetae (Fig. 2c). In general, 
chaetae are in close proximity to the stiff body wall, an arrangement that 
facilitates a better anchorage in tubes and burrows. Moreover, antennae 
are absent, and palps have been lost independently in several taxa. The 
taxa of this clade are commonly characterized by a sedentary life, as 
more or less sessile organisms that live below stones, tube builders, or 
burrowers by means of peristalsis such as earthworms”. Sedentaria are 
generally microphagous. Taxa without appendages such as those 
formerly grouped as Scolecida’’ are deposit feeders, often ingesting 
sediment. By contrast, taxa with sometimes elaborate head appendages 
such as terebellids or serpulids are surface deposit feeders or filter 
feeders, respectively”. The deposit feeding lifestyle also generally 
applies to most Clitellata. Therefore, we named this clade Sedentaria’* 
(now including Clitellata), and it is characterized by adaptations to a 
more sedentary lifestyle by, for example, the reduction of parapodia 
and loss of internalized supporting chaetae. A key feature is that the 
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chaetae are in closer proximity to the stiff body wall, rather than being 
embedded in parapodial lobes (which are more flexible) as is typical for 
errant annelids. Interestingly, errant polychaetes with sedentary life 
strategies such as Lumbrineridae or Onuphidae have adapted to such 
a lifestyle by using different solutions’. 

Hence, within Annelida, there are two major clades, Errantia and 
Sedentaria, whose evolution was driven by the adaptation to two dif- 
ferent modes of life. Errantia show a more mobile and active life 
strategy than Sedentaria, and this is correlated to increased sensory 
perception and motility. Sedentaria are more sessile, with accompany- 
ing reductions of head and body appendages and the position of the 
chaetae being in closer proximity to the body wall than in Errantia. 
Annelids have been successfully established as models in evolutionary 
developmental studies to deduce the characteristics of the last common 
bilaterian ancestor’*. Of the recent model organisms, Platynereis, with 
its well-developed head and parapodial appendages, is a good repres- 
entative of Errantia. By contrast, Capitella (as a burrower with reduced 
appendages), Helobdella (as a clitellate) and Hydroides (as a filter feeder 
using its radiolar crown) represent different microphagous feeder 
types in Sedentaria. 


METHODS SUMMARY 


EST libraries of 1,370 clones, on average, were prepared for 17 annelid species 
(Supplementary Table 1). All original sequence data have been deposited in the 
NCBI Expressed Sequence Tag database (dbEST). EST or genomic data from 17 
additional annelid species and 5 outgroup species were obtained from public 
archives (Supplementary Table 1). These raw EST data were further processed as 
described previously”. Sets of orthologous genes were determined using the pro- 
gram HaMStR in combination with the InParanoid database (without ribosomal 
proteins)*®, and were translated into amino acid sequences using the program 
ESTwise’’. In parallel, we retrieved all ribosomal proteins from databases as 
described previously” (Supplementary Table 2). Each orthologous gene set was 
aligned using MAFFT software” and masked using the program REAP”. Only genes 
that had taxon coverage of at least 33.3% were included in the final super-matrix. 

Phylogenetic trees were inferred from this data set of 39 taxa by using a Bayesian 
inference approach (using the site-heterogeneous CAT model) and a maximum 
likelihood approach (using the LG model). Stabilities of taxa were assessed using the 
leaf stability index as calculated by Phyutility software*® (Supplementary Table 3). 
The five annelid taxa with an index below 0.925 were removed in the second data set, 
and the Bayesian inference analysis was repeated. Branch attachment frequencies of 
these unstable annelid taxa were assessed using the lineage movement option in 
Phyutility° based on the data set with all taxa. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


General outline. EST libraries were prepared for 17 annelid species, and they were 
used in combination with EST or genomic data from 17 additional annelid species 
and 5 outgroup species from public archives, and further processed as described 
previously”®. Sets of orthologous genes were determined using the program HaMStR 
in combination with the InParanoid database (without ribosomal proteins)”®, and 
were translated into amino acid sequences using the program ESTwise”’. In parallel, 
we retrieved all ribosomal proteins from databases as described previously’. Each 
orthologous gene set was aligned using MAFFT software** and masked using the 
program REAP”. Phylogenetic trees were constructed by using a Bayesian inference 
approach and a maximum likelihood approach. Stabilities of taxa were assessed 
using the leaf stability index as calculated by Phyutility software”. The five annelid 
taxa with an index below 0.925 were removed, and the Bayesian inference analysis 
was repeated. Branch attachment frequencies of these unstable annelid taxa were 
assessed using the lineage movement option in Phyutility*®. 

Data assembly. Supplementary Table 1 lists taxa (34 annelids and 5 outgroup taxa) 
used in this study. On collection, samples were frozen at —80 °C. Total RNA was 
isolated using an RNeasy Plant Mini Kit (Qiagen) and then reverse transcribed to 
double-stranded cDNA with the Mint- Universal cDNA synthesis kit (Evrogen) to 
produce amplified cDNA libraries. The cDNA was size fractioned using CHROMA 
SPIN-1000 (Clontech). Sfil-digested cDNA allowed directional cloning into pDNR- 
lib. On average, 1,370 clones—ranging from 368 in Ophelia limacina to 4,135 in 
Myzostoma cirriferum—were successfully 5’-end sequenced from recombinant 
plasmids (at the htpt group of R. Reinhardt) by using Sanger-based sequencing 
technology. For Glycera tridactyla, sequences were generated with 454 technology 
by LGC Genomics. All original sequence data have been deposited in NCBI dbEST*. 

Recent studies successfully used ribosomal proteins obtained from EST data- 
bases to resolve deep metazoan phylogeny'****’. Therefore, ribosomal protein 
sequences were extracted from these EST data (Supplementary Table 2) using the 
human ribosomal proteome (retrieved from the Ribosomal Protein Gene 
Database**) as a search template during local BLAST searches (tblastn algorithm 
and an e-value <e'° as a match criterion). To substantially increase the amount 
of data, we also determined sets of orthologous genes using the program 
HaMStR”, which derives a set of primer taxa from the InParanoid database”, 
generating a set of core orthologous genes to build, train and calibrate a profile 
Hidden Markov Model. This model is then used to search for orthologues in the 
EST data. Further confirmation of the orthology of determined EST sequences was 
achieved in a final step of a reciprocal BLAST search against the proteome of one of 
the primer taxa, ideally the closest relative of the primer taxa to the query taxon. 
Orthology was accepted only if the same gene was retrieved as the best hit as in the 
set of core orthologous genes. We used the following set of primer taxa: Capitella 
teleta, Helobdella robusta, Lottia gigantea, Schistosoma mansoni, Daphnia pulex, 
Apis mellifera and Caenorhabditis elegans. The nucleotide sequence was translated 
into amino acids using ESTwise”’, and each set of orthologous genes was indi- 
vidually aligned using MAFFT”® with default settings. Questionably aligned posi- 
tions were eliminated with the alignment masker REAP” for each individual 
partition using default parameters. 

For the sensitivity analyses, we generated three super-matrices based on taxon 
coverage per gene. The first matrix consisted of genes that were present in at least 
one-third of the taxa. In the second, the genes were present in at least one-half of 
the taxa, and in the third, the genes were present in at least two-thirds of the taxa. 
Thus, matrix coverage increased from the first to the third super-matrix, but the 
number of positions decreased. Custom Perl scripts were written for all of these 
steps. The data set consisting of genes that were present in at least one-third of the 
taxa was deposited at http://www.treebase.org and can be accessed at http://purl. 
org/phylo/treebase/phylows/study/TB2:S10986. Together with the information 
provided in Supplementary Table 6 and the Supplementary Information, all data 
sets used in the course of our analyses can be generated from this data set. 
Phylogenetic analyses. The most appropriate substitution model for these three 
matrices was LG + I + Jas determined based on the Akaike information criterion 
using ProtTest*’. Before the time-consuming Bayesian inference analyses, we 
conducted a series of maximum likelihood analyses to assess the influence of 
the number of positions, the percentage of missing data and leaf stability on BS. 
Therefore, taxa that had less than 15%, 17.5% or 20% of the total positions in the 
largest super-matrix were excluded (Supplementary Table 2). Similarly, taxa with a 
leaf stability index of less than 0.875, 0.900 or 0.925 were excluded from the three 
super-matrices (Supplementary Table 3). We did not exclude annelid taxa with an 
index less than 0.950 because this was above the mean leaf stability of 0.943. 
Moreover, we also prepared one data set excluding only Myzostomida from the 
largest data set with 47,953 positions. Finally, we partitioned this data set based on 
our two strategies to assemble the data set. The first data set comprised only the 
ribosomal proteins; the second, the genes that were identified by HaMStR, without 
any ribosomal proteins; and the third, all HaMStR-identified genes, including the 
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ribosomal genes (which were also detected by HaMStR). Thus, we had a total of 25 
data sets (Supplementary Table 4). Maximum likelihood analyses were conducted 
with RAxML version 2.7.6 (ref. 37), using 100 replicate searches starting from 
randomized maximum parsimony trees. Confidence values for the edges of the 
maximum likelihood trees were determined based on bootstrap replicates. We 
used the automatic bootstopping option”*® (-# autoMRE) in RAxML toa maximum 
of 1,000 replicates (Supplementary Table 4). Leaf stability indices, as well as lineage 
movements of the unstable taxa, were determined using Phyutility”° and the boot- 
strap trees of the analyses comprising all taxa. 

On the basis of the results of the sensitivity analyses, we conducted two Bayesian 

inference analyses using PhyloBayes v3.2d” and the site-heterogeneous CAT 
model (which is not available for RAxML), as it has been shown that this model 
is more robust against LBA artefacts and thus less prone to systematic errors in 
phylogenetic data sets*®. One data set comprised all 39 taxa and 47,953 positions, 
and in the other all annelids showing a leaf stability index below 0.925 were 
excluded (34 taxa, 47,953 positions). Each analysis ran eight chains in parallel 
for 29,525 cycles on average (ranging from 28,894 to 29,808) for the data set with 
39 taxa and for 34,693 on average (ranging from 33,560 to 34,944) for the one with 
34 taxa. To conduct these analyses, we used Mac OS X v10.6.4 with 2 X 2.93 GHz 
Quad-core Intel Xeon processors and 16 GB, 1,066 MHz DDR3 RAM. Using all 
eight processors in parallel, the two PhyloBayes analyses ran for 37 days, which is 
equivalent to nearly 10,000 h of CPU time. Stable convergence of likelihood values, 
alpha parameter and tree length of the eight chains was assessed using Tracer 
v1.4.1 (ref. 41), and if we had sampled nearly two times more trees than would be 
discarded as burn-in, this was taken as a stopping point. The first 10,000 cycles 
(trees) of each chain were discarded as burn-in, and the majority rule consensus 
tree containing the PPs was calculated from the remaining trees of the eight chains 
of each Bayesian inference analysis, sampling each second tree. Thus, the con- 
sensus trees are based on a total of 78,099 or 98,771 trees, respectively. We also 
tested a posteriori whether the CAT model was superior to the LG model in the 
PhyloBayes analyses using the cross-validation test’” implemented in PhyloBayes. 
On the basis of the data set comprising 39 taxa and 47,953 positions, this test was 
conducted using ten replicates with a tenfold cross-validation. This means that the 
learning alignment consisted of 90% of the positions of the original alignment, and 
the test alignment consisted of the remaining 10%. The tests were run using the 
tree shown in Fig. 1 as a fixed topology, as suggested by the manual, and 1,100 
cycles with a burn-in of 100. 
Ancestral state reconstruction. For the ancestral reconstructions, we used a 
morphological data matrix reported previously*’, which is largely based on previ- 
ously published data matrices'***. We slightly modified this matrix (Supplemen- 
tary Information) by updating/changing the coding of characters related to “shape 
of parapodia”, “pygidial cirri”, “uncini”, “hooks” and “presence of eyes” according 
to the literature'””?***-*”, Instead of the character “aciculae”, we coded the pres- 
ence of internalized supporting chaetae’’. 

Ancestral reconstructions were done for the last common ancestor of Annelida, 
clade 1, Sedentaria and Errantia based on the Bayesian inference as well as the 
maximum likelihood tree of the 39-taxa data set with 47,953 positions, using 
Mesquite v2.72 (ref. 48). We used the parsimony reconstruction option, and all 
characters were regarded as unordered. Sipuncula and Echiura have lost nearly all 
of their morphological annelid characters. However, it is well known that severe 
secondary losses of characters can strongly hamper reconstructions based on mor- 
phological data because they cannot easily be differentiated from primary absence”. 
Therefore, we did not consider these two taxa in the ancestral reconstructions. 

To visualize the results of the ancestral reconstructions (Supplementary Table 5), 
we drew graphical depictions of body and parapodial characters using a basic 
schematic approach (Fig. 2). We refrained from using a representative approach 
for two reasons. First, no family of recent polychaetes shows only all of the char- 
acters of any of the four ancestral reconstructions. Second, a representative 
approach using, for example a recent polychaete family, has the potential to mislead 
in that this family might be taken to fully represent basal Annelida, Sedentaria or 
Errantia. However, each recent taxon is a patchwork of plesiomorphic and 
apomorphic characters and is as closely or distantly related to an ancestor in 
evolutionary times as any other recent descendant of that ancestor is. 

For each of the four clades, we used a schematic representation of a homono- 
mously segmented worm with parapodia. For Annelida and clade 1, this worm 
also had grooved palps, nuchal organs and bicellular eyes (Fig. 2a). The recon- 
struction of parapodial features was uncertain except for the composition of 
chaetae, because the last common ancestors of both clades had only simple chaetae 
and internalized supporting chaetae. Therefore, we depicted the two extreme 
possibilities: we included either all features that were eventually present in the 
ground pattern (such as large dorsal notopodial and ventral neuropodial lobes and 
dorsal and ventral cirri (sensory parapodial appendages)) or only those features 
that, on the basis of the reconstruction, were definitely present (such as a large 
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notopodial and a small neuropodial lobe). For Sedentaria, the parapodia were 
reduced, with a small neuropodial lobe and the absence of internalized supporting 
chaetae. The uncertain presence of uncini/hooks in the ground pattern of 
Sedentaria is indicated by a question mark (Fig. 2c). For Errantia, we also inferred 
one pair of antennae and pygidial cirri (sensory appendages at the body end), 
multicellular eyes (instead of bicellular eyes) and solid, sensory palps, which 
moved from a dorsal to a ventral position as is typical for such palps (Fig. 2b). 
The parapodia consisted of large notopodial and neuropodial lobes and ventral 
cirri. The presence of dorsal cirri in the ground pattern of Errantia was uncertain, 
so we have shown them with dashed lines only. 
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A tension-induced mechanotransduction pathway 
promotes epithelial morphogenesis 


Huimin Zhang", Frédéric Landmann't}*, Hala Zahreddine', David Rodriguez’, Marc Koch? & Michel Labouesse! 


Mechanotransduction refers to the transformation of physical 
forces into chemical signals. It generally involves stretch-sensitive 
channels or conformational change of cytoskeleton-associated 
proteins’. Mechanotransduction is crucial for the physiology of 
several organs and for cell migration”*. The extent to which mech- 
anical inputs contribute to development, and how they do this, 
remains poorly defined. Here we show that a mechanotransduction 
pathway operates between the body-wall muscles of Caenorhabditis 
elegans and the epidermis. This pathway involves, in addition to a 
Rac GTPase, three signalling proteins found at the hemidesmo- 
some: p21-activated kinase (PAK-1), the adaptor GIT-1 and its 
partner PIX-1. The phosphorylation of intermediate filaments is 
one output of this pathway. Tension exerted by adjacent muscles 
or externally exerted mechanical pressure maintains GIT-1 at hemi- 
desmosomes and stimulates PAK-1 activity through PIX-1 and Rac. 
This pathway promotes the maturation of a hemidesmosome into a 
junction that can resist mechanical stress and contributes to co- 
ordinating the morphogenesis of epidermal and muscle tissues. 
Our findings suggest that the C. elegans hemidesmosome is not 
only an attachment structure, but also a mechanosensor that 
responds to tension by triggering signalling processes. We suggest 
that similar pathways could promote epithelial morphogenesis or 
wound healing in other organisms in which epithelial cells adhere to 
tension-generating contractile cells. 

Most organs and complex tissues contain several cell types, all of 
which can contribute to define organ or tissue shape. The way in which 
different cell types communicate during morphogenesis is poorly 
understood. In C. elegans, the epidermis and muscles both guide 
embryonic elongation’. The role of epidermal cells in elongation is 
well defined*. By contrast, our understanding of how muscles affect 
elongation and interact with the epidermis remains vague. Mutants 
with defective muscles arrest midway through elongation at a stage 
known as two-fold, and this phenotype is called Pat (paralysed at two- 
fold)’. Communication between muscles and the epidermis could be 
channelled through junctions that attach the epidermis to the extra- 
cellular matrix at the muscle-epidermis interface. These junctions 
fasten muscles to the exoskeleton and are essential for elongation® 
(Fig. la). Each junction includes two hemidesmosome-like units at 
the apical and basal epidermis plasma membranes, with intermediate 
filaments in between® (Supplementary Fig. 1a). Hereafter, we refer to 
hemidesmosome-like junctions as CeHDs (C. elegans hemidesmo- 
somes). The physiological role of CeHDs led us to consider whether 
muscles could signal to the epidermis through a mechanical input. We 
thus examined whether muscle contractions mechanically modify the 
epidermis, and we searched for CeHD proteins that respond to this 
mechanical change. 

If muscles deform the epidermis, their contractions should modify 
the relative positions of two points within the epidermis. We tested this 
possibility using the actin bundles anchored to the plasma membrane’ 


as spatial landmarks, measuring the distance between bundles when 
muscles become active (Fig. 1b-d). Kymographs show that muscle 
contractions reduced this distance by about 50%, because the reduc- 
tion in distance was abolished in muscle-defective embryos (Fig. 1c, d 
and Supplementary Movies 1 and 2). Thus, muscle contractions 
laterally stretch and squeeze the epidermis, a process comparable to 
the stretching of cultured cells grown on elastic membranes*’. 
Disruption of the CeHD core component, VAB-10A" (a plectin and 
BPAG]e homologue), also strongly compromised this process (Fig. Ic, 
dand Supplementary Movie 3), outlining the crucial role of CeHDs in 
transmitting muscle tension. Moreover, consistent with earlier find- 
ings suggesting that muscles help the patterning of CeHDs*"’, muscles 
promoted the maturation of CeHDs from an initial punctate distri- 
bution (Fig. le, f) to short parallel circumferential stripes (Fig. 1g, h), 
co-localizing with epidermal actin bundles (Supplementary Fig. 1d). 
CeHD structure was initially normal in embryos with defective myo- 
filaments, but the reorganization of CeHDs was abnormal in the 
absence of muscle tension (Supplementary Fig. 1b, c). 

To identify epidermal proteins that are activated by tension, we 
relied on a recent genetic screen that identified 14 genes whose knock- 
down—combined with a weak mutation in vab-10, called vab- 
10a(e698) (Fig. 2a)—affects CeHD biogenesis'*. Among these genes, 
we focused on the signalling molecule PAK-1 (Fig. 2b), because its 
mammalian homologues control the cytoskeleton and can relay 
changes in arterial pressure to activate downstream signalling’. 
We found that PAK-1 distribution coincides with intermediate- 
filament proteins at all stages of development, reorganizing into short 
parallel stripes typical of CeHDs (Fig. 2c-e and Supplementary Fig. 
2a-c). PAK-1 was enriched at basal CeHDs marked by LET-805 (also 
known as myotactin), although it was also present at apical CeHDs 
(Supplementary Fig. 2d-g). Lack of PAK-1 function affected embryonic 
elongation, reducing body length by 19% (Supplementary Fig. 2h,, i, 1). 

Consistent with PAK-1 presence at CeHDs, the kinase-domain dele- 
tion mutant pak-1(0k448) (Fig. 2b), combined with the weak viable 
mutation vab-10A(e698), affected CeHD integrity. In these vab- 
10A(e698); pak-1(0k448) double mutants, staining for VAB-10A 
showed a failure to form stripes in many areas (arrow in Fig. 2n) or less 
staining where muscles had detached from the body wall (arrowhead in 
Fig. 2n). As a result, more than 60% of these double mutants showed 
muscle detachment, which was associated with elongation arrest 
(Fig. 2k and Supplementary Table 1), and this was not seen in either 
single mutant (Fig. 2f, g, i, j and Supplementary Table 1). VAB-10A 
distribution became abnormal in vab-10A(e698); pak-1(0k448) double 
mutants after the 1.7-fold stage (Fig. 2h), when muscles start to contract, 
suggesting that CeHDs cannot maintain their integrity when exposed to 
muscle-induced tension. Taken together, these findings indicate that 
PAK-1 functions with VAB-10A to strengthen CeHD stability. 

Next, we investigated how PAK-1 helps the assembly of CeHDs. In 
vitro studies established that vertebrate PAK1 phosphorylates the 
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Figure 1 | Muscle tension promotes C. elegans hemidesmosome 
maturation. a, Schemes showing a C. elegans embryo (top) and a cross-section 
of the embryo (at the level of red lines; bottom) and its hemidesmosomes 
(CeHDs), numbered 1-4. Three epidermal cell types are found around the 
circumference: dorsal and ventral (which uniquely express e/f-3); and lateral. A, 
anterior; D, dorsal; P, posterior; V, ventral. b, Actin bundles (white) in WT 
embryo imaged by following an actin-binding domain labelled with GFP”. The 
dashed box shows the region selected for the kymograph in c. Scale bar, 10 um. 
c, Kymographs showing the distance change between actin bundles (white) in 
WT embryos, unc-112(RNAi) mutant embryos (which are muscle deficient) 
and vab-10A(RNAi) embryos (which are CeHD deficient) (see also 
Supplementary Movies 1-3). Red circles indicate actin-anchoring points 
displaced by muscle contractions. C, contracted distance (orange); R, relaxed 
distance (green). d, Quantification of tension changes in terms of distance 
(contracted divided by relaxed) and time span per contraction. Individual data 
points (n = 15) and mean = s.e.m. (black crosses) are shown. 

e-h, Immunostaining of WT embryos at the 1.5-2-fold stage of development 
(early; e, f) or the 3-4-fold stage (late; g, h): muscles (red) and VAB-10A 
(green). Dashed boxes in e and g demarcate the regions shown in f and 

h, respectively. 


intermediate-filament protein vimentin’. Hence, C. elegans PAK-1 
might also phosphorylate epidermal intermediate filaments (Sup- 
plementary Fig. 1a). We directly tested this hypothesis in two ways. 
First, we used two-dimensional gel analysis of embryonic extracts 
followed by immunoblotting with MH4 monoclonal antibody, which 
recognizes the CeHD proteins IFA-2 (also known as MUA-6) and 
IFA-3, as well as the non-epidermal protein IFA-1 (ref. 16). This 
revealed the presence of two major intermediate-filament isoelectric 
spots, which were not present after phosphatase treatment of extracts 
(Fig. 3a, arrows) or in pak-1(0k448) extracts (Fig. 3b, arrows). Second, 
tagging IFA-3, the major intermediate-filament protein in the embry- 
onic epidermis, with Myc showed that PAK-1 specifically affects phos- 
phorylation of IFA-3 (Supplementary Fig. 3c). We therefore conclude 
that PAK-1 indeed affects the phosphorylation of an epidermal inter- 
mediate-filament protein. 

We next assessed the effect of phosphorylation on intermediate- 
filament organization. Staining vab-10A(e698); pak-1(0k448) double 
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Figure 2 | PAK-1 function is required for CeHD maturation. 

a, b, Conserved domains of VAB-10A and PAK-1 proteins. The missense 
mutation e698 maps to the region predicted to bind to intermediate filaments”®. 
The deletions tm403 and 0k448 remove the PAK-1 CRIB domain and kinase 
domain, respectively. ABD, actin-binding domain. c-e, Co-localization of 
PAK-1 with IFA-2 and IFA-3 in a WT larva, as determined by 
immunofluorescence: PAK-1 (green) and IFA (red). Scale bar, 10 um. 

f-n, Immunostaining for muscle (red) and VAB-10A (green) of vab-10A(e698) 
(f, i, 1), pak-1(0k448) (g, j, m) and vab-10A (e698); pak-1(0k448) (h, k, n) mutant 
embryos at early or late stages of development. Dashed boxes indicate area 
shown in panel below. Dashed line in k shows where muscles should be. Arrow 
in n shows area with muscles still attached. Arrowheads in h and n show areas 
with muscles detached. Scale bar, 10 Lm. 


mutants with the MH4 monoclonal antibody revealed that abnormal, 
ectopic, intermediate-filament bundles were present outside CeHDs 
(Fig. 3g, arrow). We also observed this phenotype in combination with 
the CRIB domain deletion allele pak-1(tm403) (that is, in vab- 
10A(e698); pak-1(tm403) double mutants) but not in vab-10A(e698), 
pak-1(0k448) or pak-1(tm403) single mutants (Fig. 3d-f and Sup- 
plementary Fig. 4a, b). The ectopic intermediate-filament bundles 
seem to result from defective anchoring of intermediate filaments to 
the mutant VAB-10A in CeHDs, because tagging the IFA-2/3 hetero- 
dimer partner IFB-1 with green fluorescent protein (GFP) resulted in 
the same ectopic intermediate-filament stripes and muscle detach- 
ment phenotypes as observed in vab-10A(e698); pak-1(0k448) mutants 
(Supplementary Fig. 5a—d and Supplementary Table 1). Furthermore, 
we identified the S470 residue of IFA-3 as an important regulatory site. 
Changing this serine residue to an alanine abolished IFA-3 phosphor- 
ylation and disrupted the localization of IFA-3 to CeHDs in the vab- 
10A(e698) background (Supplementary Fig. 5e-k). Together, these 
data suggest that lack of IFA-3 phosphorylation reduces the recruit- 
ment of this protein to CeHDs and alters CeHD strength in vab- 
10A(e698) mutants. 

Having established PAK-1 as a functionally important CeHD kinase, 
we next showed that muscle contractility triggers PAK-1 activity. We 
used intermediate-filament phosphorylation and organization as read- 
outs. We examined two classes of muscle-defective embryo: one lack- 
ing EGL-19, a Ca’**-activated channel that is required for muscle 
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Figure 3 | PAK-1-induced intermediate-filament phosphorylation depends 
on muscle tension. a—c, Two-dimensional immunoblotting analysis showing 
spots that indicate IFA proteins and their phosphorylated forms. Arrows point 
to phosphorylated proteins that are present in WT embryos but not 
phosphatase-treated WT embryos, pak-1 mutants or egl-19 mutants. 
Arrowheads point to isoelectric species, which are always visible in this type of 
analysis. CIP, calf intestinal phosphatase; IEF, isoelectric focusing; MW, 
molecular weight. d-i, Immunostaining of WT and mutant embryos for IFA 
proteins (green) and VAB-10A (red). Arrows point to ectopic intermediate- 
filament bundles (g, i). Scale bar, 10 jim. j, Pull-down assay for two independent 
samples showing levels of GTP-bound CED-10 in WT embryos and a muscle 
mutant (egl-19), both expressing GFP-tagged CED-10 under an epidermal 
promoter (epi::gfp::ced-10). k, CED-10—GTP level, as determined by pull-down 
experiment in j, was normalized to total CED-10 levels after densitometry 
analysis (n = 13; mean, black bar). **, P = 0.0006 (Mann-Whitney U test). 

1, Two-dimensional immunoblotting analysis showing phosphorylated IFA 
(arrows) restored in eg/-19 mutants by CED-10(G12V) in a PAK-1-dependent 
manner. Arrowheads point to isoelectric species. m, Body length of unc- 
112(RNAi) L1 larvae expressing constitutively active CED-10, MLC-4 or both 
(n> 26; y axis, arbitrary units). Data are presented as mean + s.e.m. **, 
P<3X10 ® (Student’s f-test). 


contraction!’; and the other lacking UNC-112, a kindlin homologue 
that is essential for myofilament assembly’®. As is the case in pak-1 
mutants, two-dimensional immunoblotting revealed that both classes 
of muscle-defective embryo (Fig. 3c and Supplementary Fig. 3d, 
arrows), as well as embryos lacking LET-805 or VAB-10A (Supplemen- 
tary Fig. 3d), lacked two phospho-specific IFA spots. Moreover, 
antibody staining showed ectopic intermediate-filament bundles in 
vab-10A(e698); egl-19(n2368cs) double mutants and in vab-10A(e698); 
unc-112(RNAi) embryos (where unc-112(RNAi) denotes mutants that 
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lack UNC-112 owing to RNA interference (RNAi)) (Fig. 3i and Sup- 
plementary Fig. 4c, d; compare with single mutants in Fig. 3e, h and 
vab-10A(e698); pak-1(0k448) double mutants in Fig. 3g and Supplemen- 
tary Fig. 4j). Abolishing PAK-1 function did not cause an intermediate- 
filament organization defect in egl-19(n2368cs) embryos (Supplementary 
Fig. 4e, j) and did not make the defect worse in vab-10A(e698); egl- 
19(n2368cs) mutants (Supplementary Fig. 4f, j and Supplemen- 
tary Table 1). We therefore suggest that PAK-1 acts in the pathway 
defined by muscle tension and that this pathway requires VAB-10A 
function. Together, these data strongly suggest that epidermal PAK-1 
responds to mechanical stimulation by modifying intermediate-filament 
phosphorylation. 

We extended our study to identify the missing links between PAK-1 
activity and muscle tension. Consistent with the GTPases Rac and 
CDC42 being the most common PAK activators'’, we found that 
PAK-1 activation by tension requires the GTPase Rac. First, we mea- 
sured the levels of GIP-bound CED-10 (the C. elegans homologue of 
Rac) and CDC-42 in the epidermis by pull-down assays. We observed a 
significant reduction (38%) in the CED-10-GTP level when muscle 
tension was lost (Fig. 3j, k). In comparison, the CDC-42 GTP level 
was reduced by only 12%, with a lower level of confidence (Sup- 
plementary Fig. 6b). In vivo, reducing the function of CED-10 in 
vab-10A(e698) embryos caused similar ectopic intermediate-filament 
bundling phenotypes (Supplementary Figs 4j and 6f; compare with 
Fig. 3g). Conversely, epidermal expression of CED-10(G12V)”, an 
amino acid substitution mutant that is constitutively active, rescued 
intermediate-filament phosphorylation and bundling defects caused 
by tension loss (Fig. 31 and Supplementary Fig. 6c, d) in a PAK-1- 
dependent manner (Fig. 31). Yet CED-10(G12V) failed to rescue the 
elongation arrest of muscle-defective embryos (Fig. 3m). 

One possibility is that muscle tension promotes epidermal processes 
in addition to intermediate-filament phosphorylation. Specifically, 
because CeHDs co-localize with actin bundles, muscle tension could 
activate non-muscle myosin II, a key molecule that drives cell shape 
changes in elongation*®*'. We tested this possibility and found that 
the combined expression of a constitutively active CED-10 (CED- 
10(G12V)) and a constitutively active version of the myosin regulatory 
light chain MLC-4 (MLC-4DD)” significantly rescued the elongation of 
unc-112-defective embryos (Fig. 3m). Rescue was partial, and we pre- 
sume that this is either because CED-10(G12V) and MLC-4DD cannot 
fully recapitulate the on-off pattern of muscle tension or because tension 
stimulates additional pathways. We have not tried to unravel the path- 
way leading to MLC-4 activation in normal embryos, but we conclude 
that muscle tension has more than one output in the epidermis. 
Together, we suggest that CED-10 responds to muscle tension, inducing 
the kinase activity of PAK-1 and strengthening CeHDs with VAB-10A. 

The involvement of CED-10 in relaying muscle tension prompted us 
to look for the Rac guanine-nucleotide exchange factor (RacGEF) that 
acts in the pathway. We examined the potential involvement of four 
GEF proteins that are commonly found to act with PAK in vertebrates’’, 
and we identified PAK-interacting exchange factor (PIX-1) as being 
involved in C. elegans (Fig. 4c and Supplementary Fig. 4g-i). Previous 
studies have defined a highly conserved signalling complex containing 
PAK, PIX and G-protein-coupled receptor kinase interactor (GIT) that 
interacts with Rac/CDC42 GTPases””’. Strikingly, both C. elegans PIX- 
1 and GIT-1, visualized by functional translational GFP constructs”*, 
localized to CeHDs (Fig. 4a, b and Supplementary Fig. 7a-g), suggesting 
that they could act together with PAK-1. Lack of either PIX-1 or GIT-1 
function affected normal elongation (Supplementary Fig. 2h-l), and 
when combined with vab-10A(e698) resulted in CeHD defects 
(Fig. 4c-f and Supplementary Table 1). Moreover, two-dimensional 
immunoblotting showed that pix-1- and pak-1-null mutants have 
identical intermediate-filament phosphorylation profiles (Supplemen- 
tary Fig. 7h and Fig. 3b). We conclude that PIX-1, GIT-1 and PAK-1 
together regulate intermediate-filament phosphorylation and CeHD 
biogenesis. 
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Figure 4 | GIT-1 maintenance at CeHDs in a tension-dependent manner 
and PIX-1 promote PAK-1 activation. a, b, Localization of translational PIX- 
1-GFP (a) and GIT-1-GFP (b) in WT embryos. c, d, Immunostaining for IFA 
proteins (green) in vab-10A(e698); pix-1(gk416) and vab-10A(e698); git- 
1(tm1962) double mutants in early-stage embryos. Arrows point to ectopic 
intermediate-filament bundles. e, f, Immunostaining for VAB-10A (green) and 
muscle (red) in late-stage embryos of listed mutants, showing muscle 
detachment (arrows). g, Diagram showing the set-up of force stimulation. 

h, Quantification of CeHD-localized GIT-1-GFP level compared with time 
zero (n = 12). Data are presented as mean + s.e.m. **, P = 0.009 (Mann- 
Whitney U test). i-l, Representative images showing GIT-1-GFP localization 
(arrows) in unc-112(RNAi) embryos (denoted pat) with (k, 1) or without 

(i, j) external force stimulation. a-f, i-l, Scale bars, 10 um. 


Previous reports defined a Rac-independent PIX-1-GIT-1-PAK-1 
signalling pathway driving distal-tip cell migration in C. elegans**. 
However, PIX-1 seems to act through Rac during CeHD maturation, 
because CED-10-GTP levels were 25% lower in pix-1-null embryos 
than in wild-type embryos (Supplementary Fig. 7i, j). We interpret the 
differences in CED-10-GTP levels in pix-1-null and muscle-deficient 
mutants (25% reduction versus 38% reduction) as an indication that 
muscles activate CED-10 outside CeHDs, whereas PIX-1 is mainly 
found at CeHDs (Fig. 4a). 

The identification of PIX-1 and GIT-1 as crucial factors in CeHD 
biogenesis posits them as early effectors of muscle tension. To define 
how they become activated, we examined their distribution in muscle- 
deficient embryos. Whereas PAK-1 and PIX-1 still localized to CeHDs 
in the absence of muscle tension (Supplementary Fig. 8a-h and 
Supplementary Movies 4 and 5), GIT-1 progressively disappeared 
from CeHDs as embryos stopped elongation (Fig. 4i, j, Supplemen- 
tary Fig. 8i-] and Supplementary Movies 6 and 7). This finding sug- 
gests that muscle tension is required for maintaining GIT-1 protein at 
CeHDs. If correct, this model predicts that external mechanical pres- 
sure should substitute for muscle tension. We tested this prediction by 
submitting UNC-112-defective embryos to repeated mechanical pres- 
sure (Fig. 4g and Supplementary Fig. 9). Compared with untreated 
embryos, this regimen considerably retarded the diffusion of GIT-1 
away from CeHDs in UNC-112-depleted embryos (Fig. 4h-I). We 
conclude that CeHDs are indeed mechanosensitive and are under 
the direct influence of physical forces. 

Studies relying on cell stretching in vitro have outlined the role of 
integrin receptors in relaying tensile stretch’”*. Likewise, in C. elegans, 
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we propose that the extracellular matrix receptor LET-805 or its inter- 
acting partners relays muscle tension. This could in turn trigger a 
conformational change of a CeHD protein (for example, VAB-10A) 
able to maintain GIT-1 at CeHDs, as has been observed for talin in 
focal adhesions”*. The identity of the protein(s) that transmits muscle 
tension and anchors GIT-1 to CeHDs remains to be uncovered. 
Furthermore, we suggest that GIT-1 maintains a functional link 
between tension and PIX-1-CED-10-PAK-1, possibly by keeping 
PIX-1 in a conformational state in which it is able to activate CED- 
10 (Supplementary Fig. 10). 

In conclusion, the identification of the GIT-1-PIX-1-PAK-1 signal- 
ling module implies that CeHDs, and presumably vertebrate hemides- 
mosomes, not only are structural entities, but also are endowed with 
signalling potential. Since the discovery of the Pat mutant phenotype’, 
the reason why muscle contraction is required for embryonic elonga- 
tion has remained elusive. Our demonstration that muscle tension 
activates PAK-1-PIX-1-GIT-1 signalling and non-muscle myosin II 
clearly supports a hypothesis based on a mechanotransduction process 
during elongation. Our results raise the possibility that contractile cells 
could locally influence the behaviour of adjacent epithelial cells in other 
developmental settings, particularly in organs in which epithelial cells 
are lined with smooth muscle cells or in pathological situations such as 
wound healing and cancer. Contractile forces seem to act like yin and 
yang in development: too much force will tear a tissue apart, but 
moderate and sustained force will promote differentiation. 


METHODS SUMMARY 


Pull-down assays to analyse GTPase activity were performed using the Rac/Cdc42 
activation assay Biochem Kit (Cytoskeleton). To apply external forces to embryos, 
a needle with a 40-11m blunt end was positioned above embryos that had been 
immobilized on a glass-based culture dish (IWAKI) coated with poly-lysine and 
placed on a inverted TCS SP2 confocal microscope (Leica). The microscope was 
then programmed for a time-lapse sequence in xyzt dimension with a 6-j1m z 
distance, at a 1.6-s periodicity to mimic the pulse of muscle contraction. A full 
description, including strain details, construct descriptions, other microscopy 
experiments and immunoblotting approaches, can be found in the Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Strains and genetic methods. Control N2 (Bristol) and other strains of C. elegans 
were propagated as described previously”’ at 20 °C. Mothers were shifted to 15 °C 
or 12°C before egg-laying, when indicated. Alleles used in this study are vab- 
10A(e698), pak-1(0k448), pak-1(tm403), egl-19(n2368cs), ced-10(n3246), pix- 
1(gk416) and git-1(tm1962). The actin-binding-domain-GFP construct 
mceIs51[lin-26p:: ABD yap-19::GFP, myo-2p::GFP] is driven by an epidermal pro- 
moter and reveals only actin filaments present in the epidermis”. egl- 19 encodes 
the o-subunit of the voltage-gated Ca”* channel; it is expressed in muscles and 
neurons but not in the epidermis'’. The missense allele egl-19(n2368cs) leads to 
mild muscle defects at 20°C and to a Pat phenotype at 12 °C°. The allele pak- 
1(0k448) encodes a protein with a deleted kinase domain; pak- 1(tm403) encodes a 
protein with a deleted CRIB domain”™*. vab-10A(e698) is a viable mutation caus- 
ing animals to have a bent head'°”’. The null allele pix-1(gk416) encodes a protein 
with the entire SH3 domain deleted, causing a premature frameshift’. The strong 
loss-of-function allele git-1(tm1962) encodes a protein that lacks the second GIT 
domain, which is presumably required for binding to PIX-1 (ref. 24). The loss-of- 
function allele ced-10(n3246) is a G-to-A missense mutation resulting in a G60R 
substitution”. The tvIs41[ifb-1::GEP, rol-6(su1006)] integrated strain was pro- 
vided by L. Broday*’. 

RNAi. RNAi was induced either by bacterial feeding using specific clones from the 
MRC feeding RNAi library, after verifying the sequence identity of the corres- 
ponding insert’*”’, or by microinjecting double-stranded RNA corresponding to 
the relevant gene. Bacterial feeding RNAi was used for all experiments involving 
unc-112 and for additional biochemical tests involving let-805 and vab-10A knock- 
down. Mothers were fed with double-stranded RNA corresponding to those genes 
from the L3 stage, generating 80-100% of embryos arrested at about the two-fold 
stage. RNAi by microinjection was used to test whether pix-1, vav-1, unc-73 or sos-1 
induced an intermediate-filament bundling phenotype in the vab-10A(e698) back- 
ground. The following primers were used for generating double-stranded RNA: 
pix-1, 5'-taatacgactcactatagggeatttgtgtgaaacccttcg, 3’-taatacgactcactataggcatgaaaa 
cactcacttcttcg; and sos-1, 5'-taatacgactcactatagggaaaacggaaagattcgtct, 3’- taatac- 
gactcactatagggacccattgattgatgacac. Double-stranded RNA for vav-1 and unc-73 
were generated using plasmids from the MRC RNAi library as templates”’. 
Molecular biology and transgenesis. The translational PAK-1-GFP fusion was 
generated by PCR cloning of the pak-1 coding sequence upstream of the GFP coding 
sequence in the vector pPD95.75, using a primer located 4 kilobases (kb) upstream 
of the pak-1 start codon. Translational PIX-1-GFP and GIT-1-GFP fusion con- 
structs were provided by H.-J. Cheng. To drive gfp::ced-10 and gfp::cdc-42 in the 
epidermis, wild-type ced-10 and cdc-42 cDNAs were PCR amplified from total 
embryonic RNA and cloned in-frame downstream of the GFP coding sequence 
under the control of a 432-base-pair dpy-7 promoter fragment* (pPD95.75 back- 
bone). Sequence of the constitutively active CED-10(G12V) construct (provided by 
J. Nance) was extracted by PCR from the plasmid pDA80 (ref. 33) and inserted into 
a pPD95.75 derivative lacking the GFP coding sequence; the same dpy-7 promoter 
piece was added. The constitutively active MLC-4(T17DS18D) form was generated 
from the plasmid Pmlc-4::gfp::mlc-4(T17DS18D)”; the mlc-4 promoter was 
replaced by the elt-3 promoter, which is active only in dorsal and ventral epidermal 
cells in contact with muscles**. Pelt-3::gfp::mlc-4(T17DS18D) is referred to in the 
text as MLC-4DD. The ifa-3::myc fusion construct was generated by using 6-kb ifa-3 
genomic sequence, including 3-kb upstream promoter, with the tag inserted just 
before the ifa-3 stop codon. Mutagenesis was carried out using a mutagenesis kit 
(Stratagene). Transgenes were injected at a concentration of 10 ng tl | for Ppak- 
1:pak-1::gfp, Ppix-1:pix-1:gfp, Pgit-1::git-1:gfp, Pdpy-7::gfp::ced-10 and Pdpy- 
7::gfp::cdc-42; Ing pl | for Pdpy-7::ced-10(G12V) and ifa-3::myc; and 2 ng pl * for 
Pelt-3::mlc-4(T17DS18D). For each transgene, two lines were selected for further 
analysis. 

Elongation/body-length measurement. To measure the elongation defects of 
pak-1, pix-1 and git-1 mutants, wild-type and mutant mothers were bleached, 
and eggs were left to hatch without bacteria at 20 °C for 20 h. DIC images of newly 
hatched L1 larvae were taken under X10 magnification, and the body length of 
each larva was measured using ImageJ software (http://rsb.info.nih.gov/ij/). To 
test whether expression of the constitutively active proteins CED-10(G12V) and 
MLC-4DD rescues the elongation of muscle-defective (Pat) embryos, strains 
carrying Pdpy-7::ced-10(G12V), Pelt-3::mlc-4DD or both transgenes were fed on 
bacteria containing unc-112(RNAi) for 48 h. Paralysed larvae were obtained by 3-h 
egg laying followed by 24-h incubation at 20°C. DIC images and fluorescent 
images (for visualizing the presence of the transgene by co-injection markers) of 
newly hatched L1 larvae were taken under X20 magnification, and the body length 
of each larva was measured using ImageJ. Comparisons were made between 
transgene-negative and transgene-positive larvae produced from the same 
mothers. Statistical analysis was carried out by using Student’s t-test, and signifi- 
cance was accepted at P< 0.01. 


Immunostaining and fluorescence microscopy. Embryos were fixed and stained 
by indirect immunofluorescence as described elsewhere’. Dilution factors for 
primary antibodies were anti-VAB-10A (4F2)'°, 1/1000; anti-PAK-1 (provided 
by L. Lim)*, 1/200; anti-intermediate filament (MH4)*° and anti-LET-805 
(MH46)"', 1/500; uncharacterized muscle antigen*” (NE8/4C6, MRC), 1/50; 
anti-GFP (2A3, IGBMC antibody lab), 1/500; and anti-Myc (M6, IGBMC 
antibody lab), 1/1000. MH monoclonal antibodies were purchased from the 
DSHB (Iowa University). The MH4 monoclonal antibody recognizes three IFA 
intermediate filaments, IFA-1, IFA-2 and IFA-3, all of which can form heterodi- 
mers with IFB-1 (refs 16, 33, 38, 39). IFA-1 is present in the pharynx, vulva and 
several neurons; IFA-2 and IFA-3 are both present in the epidermis’®**“°. Genetic 
analysis established that loss of IFA-3 function results in embryonic elongation and 
CeHD phenotypes comparable to those observed when IFB-1, VAB-10A or LET- 
805 are missing'®'"'®. By contrast, IFA-2 acts during larval development'***”°. 

For still images of immunostained embryos and translational GFP-fusion 
embryos, stacks of images were captured every 0.3 jum using a TCS SP5 confocal 
microscope (Leica); generally, 20 confocal sections were projected with maximum 
intensity and processed using ImageJ. Translational GFP-fusion strains carrying 
the cryosensitive egl-19(n2368cs) allele were grown and kept at 12°C before 
imaging. 

Time-lapse movies were taken using a DMI6000 spinning-disk set-up (Andor 
Revolution/Leica). Images of the actin-binding—GFP line” were captured every 
125 ms, using five stacks of images with 0.2-11m spacing. Kymograph analysis was 
performed using MetaMorph software (Universal Imaging). Movies of PAK-1- 
GFP and GIT-1-GFP in elongating or paralysed embryos were recorded every 
5 min, using ten stacks of images with 0.3-11m spacing for about 1h. 

To quantify ectopic intermediate-filament stripes, images of 7-15 MH4- 
immunostained embryos were taken for each genotype. Images were then shuffled 
and genotype blinded. An investigator who was not previously involved in the 
study counted the number of ectopic intermediate-filament stripes for each 
embryo. Results are shown as the mean number of ectopic intermediate-filament 
stripes present in each embryo for each genotype. 

Two-dimensional gel electrophoresis and immunoblotting. Two-dimensional 
gel electrophoresis was carried out using 11-cm ReadyStrip IPG strips pH 5-8 (for 
MH4 antibody) or pH 3-6 (for anti-Myc antibody) ina PROTEAN IEF cell (Bio- 
Rad) according to the manufacturer’s protocol. C. elegans embryos at 1.5/3-fold 
stage were obtained by 2-h egg laying followed by growth for 6 h at 20 °C or 16h at 
12°C. Embryonic protein extracts were prepared by homogenization in a rehy- 
dration buffer containing 8 M urea, 3% CHAPS, 50 mM dithiothreitol and 0.2% 
Bio-Lyte Ampholyte. Proteins were transferred and immunoblotted with MH4 
(anti-intermediate filament) antibody or anti-Myc (M6) monoclonal antibody 
using standard protocols. The major spots in each sample were positioned at 
the same distance from the anode. 

GTPase pull-down assay. The pull-down assay to analyse CED-10 and CDC-42 
activity was performed using a Rac/Cdc42 activation assay Biochem Kit 
(Cytoskeleton) according to the manufacturer’s protocol. To ensure that we would 
measure only the amount of GTP-bound and GDP-bound Rac/CDC-42 present in 
the epidermis, extracts were prepared from animals carrying a GFP-tagged 
GTPase transgene under the control of the epidermis-specific promoter dpy-7 
(ref. 32) (see above). Embryonic protein extracts were prepared by homogenizing 
1.5/3-fold stage C. elegans embryos (see previous section) in cell-lysis buffer 
(CLBO1, Cytoskeleton) at 4°C. Two to three pairs of samples were processed 
together each time to ensure quick processing. The compatibility of the Rac/ 
Cdc42 activation assay kit with the C. elegans system was tested and confirmed 
by GTP-y-loaded or GDP-loaded CED-10 or CDC-42 in embryo lysates before the 
analysis, as recommended by the manufacturer’s protocol. After pull-down, the 
amount of GTP-bound GTPase was analysed by immunoblotting against the GFP 
tag. Densitometry analysis was performed using ImageJ. For all quantification 
experiments, statistical analysis was carried out by using the non-parametric 
Mann-Whitney U test, and significance was accepted for P< 0.01. 

External mechanical stimulation of C. elegans embryos. To apply external 
forces to embryos lacking internal muscle tension, microfilament needles were 
produced from glass capillaries using a DMZ universal puller. The needle tip was 
melted using a heater scope to create a blunt end of about 40 um in diameter. The 
blunt-ended needle was installed onto an NK2 micromanipulator (Eppendorf) 
next to a TCS SP2 confocal microscope (Leica). Pat embryos carrying a GIT-1- 
GFP translational reporter were obtained by unc-112(RNAi) feeding. Embryos 
were placed on a 12-mm glass-based culture dish (IWAKI) coated with poly- 
lysine. The culture dish was filled with M9 buffer at 1/2 dilution. After mounting 
the culture dish containing embryos onto the microscope, the glass needle tip was 
carefully placed on top of the embryo so that it just touched the eggshell. The 
confocal microscope was programmed for a time-lapse sequence in xyzt dimen- 
sion, with a z distance of 6 um, such that each upward movement of the stage 
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towards the needle tip squeezed the embryo in between. Pressing was done every 
1.6 s, at a rhythm that approximately mimicked the pulse of muscle contractions. 
Confocal images were taken before pressing at about the 1.4-fold stage, 30 min and 
60 min after pressing. Individual stacks acquired with the confocal microscope 
were processed using ImageJ and a three-dimensional median filter followed by a 
maximum intensity projection’. The GIT-1-GFP levels at CeHDs were deter- 
mined by subtracting the background levels immediately adjacent to the CeHDs 
(Supplementary Fig. 7c, d). All final values were presented as the ratio against time 
zero. Statistical analysis was carried out with the non-parametric Mann-Whitney 
U test, and significance was accepted for P< 0.01. 
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SCF’®’ regulates cellular apoptosis by targeting 
MCL1I for ubiquitylation and destruction 


Hiroyuki Inuzuka', Shavali Shaik'*, Ichiro Onoyama”*, Daming Gaol, Alan Tseng’, Richard S. Maser?**, Bo Zhai’, Lixin Wan', 
Alejandro Gutierrez®, Alan W. Lau', Yonghong Xiao®, Amanda L. Christie®’, Jon Aster’, Jeffrey Settleman’, Steven P. Gygi’, 
Andrew L. Kung®’, Thomas Look®, Keiichi I]. Nakayama’, Ronald A. DePinho? & Wenyi Wei! 


The effective use of targeted therapy is highly dependent on the 
identification of responder patient populations. Loss of FBW7, 
which encodes a tumour-suppressor protein, is frequently found 
in various types of human cancer, including breast cancer, colon 
cancer’ and T-cell acute lymphoblastic leukaemia (T-ALL)”. In line 
with these genomic data, engineered deletion of Fbw7 in mouse T 
cells results in T-ALL**, validating FBW7 as a T-ALL tumour 
suppressor. Determining the precise molecular mechanisms by 
which FBW7 exerts antitumour activity is an area of intensive 
investigation. These mechanisms are thought to relate in part to 
FBW7-mediated destruction of key proteins relevant to cancer, 
including Jun®, Myc’, cyclin E* and notch 1 (ref. 9), all of which 
have oncoprotein activity and are overexpressed in various human 
cancers, including leukaemia. In addition to accelerating cell 
growth”, overexpression of Jun, Myc or notch 1 can also induce 
programmed cell death’’. Thus, considerable uncertainty sur- 
rounds how FBW7-deficient cells evade cell death in the setting 
of upregulated Jun, Myc and/or notch 1. Here we show that the E3 
ubiquitin ligase SCF’®“”” (a SKP1-cullin-1-F-box complex that 
contains FBW7 as the F-box protein) governs cellular apoptosis 
by targeting MCLI, a pro-survival BCL2 family member, for 
ubiquitylation and destruction in a manner that depends on phos- 
phorylation by glycogen synthase kinase 3. Human T-ALL cell lines 
showed a close relationship between FBW7 loss and MCL1 over- 
expression. Correspondingly, T-ALL cell lines with defective 
FBW7 are particularly sensitive to the multi-kinase inhibitor 
sorafenib but resistant to the BCL2 antagonist ABT-737. On the 
genetic level, FBW7 reconstitution or MCL1 depletion restores 
sensitivity to ABT-737, establishing MCL1 as a therapeutically 
relevant bypass survival mechanism that enables FBW7-deficient 
cells to evade apoptosis. Therefore, our work provides insight into 
the molecular mechanism of direct tumour suppression by FBW7 
and has implications for the targeted treatment of patients with 
FBW7-deficient T-ALL. 

MCLI is frequently overexpressed in various leukaemias through 
mechanisms that are not fully understood’*. MCL1 is distinct from 
other BCL2 family members in its extremely unstable nature’, which 
provides a mechanism for cells to switch to either survival or apoptotic 
mode in response to various stresses'*. Phosphorylation of MCL1 by 
glycogen synthase kinase 3 (GSK3) regulates the stability of MCLI (ref. 
13), but little is known about the identity of the E3 ubiquitin ligase that 
targets phosphorylated MCL1 for destruction. On examination of the 
GSK3-mediated phosphorylation sites in MCLI, we surmised that they 
resemble a degron sequence that can be recognized by FBW7 (also 
known as FBXW7) (Fig. la), prompting us to test the possibility that 


GSK3-mediated phosphorylation of MCLI triggers the degradation of 
MCLI by FBW7. Depletion of FBW7 (Fig. 1b) or the SCF components 
cullin 1 (CUL1), RBX1 and SKP1 (Fig. 1c), but not other F-box proteins 
that we examined (Fig. 1b), resulted in a significant increase in the 
amount of MCLI protein. T-cell-lineage-specific depletion of FBW7 
in Fbw7 conditional knockout (Lck-Cre/Fbw7" M1) mice? resulted in 
increased MCLI levels in the thymuses of these mice (Fig. 1d), as well 
as thymic lymphoma (Supplementary Fig. 1a) and the presence of acute 
leukaemia cells in the thymuses (Supplementary Fig. 1b). Consistent 
with a recent study'’, FBW7 ‘~ human DLDI cells (Fig. le) and HeLa 
cells treated with short interfering RNA (siRNA) directed against 
FBW7 (Supplementary Fig. 1c) have elevated MCLI expression mainly 
in the mitosis (M) and early G1 phases of the cell cycle. 

The clinical relevance of this finding is further demonstrated by the 
finding that human T-ALL cell lines harbouring FB W7 mutations and/ 
or deletions have a significant increase in MCLI (Fig. 1f). Additionally, 
depletion of FBW7 in DND41 cells or Loucy cells (both of which have 
wild-type FBW7) leads to increased MCLI1 expression (Fig. 1g), 
whereas reintroduction of wild-type FBW7 dramatically reduced 
MCLI expression in FBW7-deficient T-ALL cells (Fig. 1h), supporting 
a causal relationship between loss of FBW7 activity and elevated MCL1 
expression in the T-ALL cells examined. More importantly, elevated 
MCLI expression is also observed in both primary human and mouse 
T-ALL samples with deficient FBW7 activity>* (Fig. li, j and 
Supplementary Fig. 1a, b), and depletion of MCL1 impaired T-ALL 
disease progression in vivo (Fig. 1k-m). 

Consistent with a post-translational mode of regulation, no changes 
in MCL1 mRNA levels were observed after depletion of FBW7 in DLD1 
cells (Supplementary Fig. 2d), and no positive relationship was 
observed between MCL1 mRNA levels and loss of FBW7 in T-ALL 
cells (Supplementary Fig. 2e). The half-life of MCL1 was significantly 
extended in the thymuses of Fbw7/~ mice and FBW7-deficient 
human T-ALL cells (Supplementary Fig. 3a-c), and experimental 
manipulation of FBW7 levels changed MCLI stability accordingly 
(Supplementary Fig. 3d, e). Together, these results suggest that 
MCL1 is a downstream ubiquitylation target of SCF'?Y”. 

As the proper substrate phosphorylation events are required for 
FBW7 to recognize and target its substrates for ubiquitylation'®, we 
next investigated which phosphorylation events trigger MCL1 destruc- 
tion by FBW7. Mass spectrometry analysis showed that MCLI is 
phosphorylated at multiple sites in vivo (Fig. 2a and Supplementary 
Fig. 5a—c). In addition to serine at position 159 (S159) and threonine at 
position 163 (T163)'*"”, S64 and $121 were also phosphorylated in 
vivo. Consistent with previous reports’*’’, MCL1 destruction is pro- 
moted by GSK3 (Fig. 2b) but not by the protein kinases ERK1 (also 
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Figure 1 | MCLI stability is controlled by FBW7. a, Sequence alignment of 
MCLI with the phosphodegron sequences recognized by FBW7 in Jun, Myc 
and cyclin E. The putative FBW7 phosphodegron sequence present in MCLI is 
conserved across different species. Conserved serine and threonine residues 
within the degron sequences are shown in red, and conserved proline residues 
are shown in blue. b, c, Immunoblotting (IB) analysis, with antibodies specific 
for the indicated proteins (right), of HeLa cells transfected with siRNA 
oligonucleotides directed against the indicated genes (top). d, IB analysis of 
thymocytes derived from control (Lck-Cre/Fbw7*'") mice or Fbw7 conditional 
knockout (Lek-Cre/Fbw7! fl) mice (whose thymocytes lack Fbw7). For the 
histogram, MCL] band intensity was normalized to HSP90 and then 
normalized to the control lane. Data are shown as mean + s.e.m. for three 
independent experiments. e, IB analysis of wild-type (WT) and FBW7 /~ 
DLDI cells after synchronization of the cell cycles with nocodazole and release 
from mitotic arrest at the indicated time points. pS9-GSK3, GSK3 that is 
phosphorylated at the S9 residue. f, IB analysis of the indicated human T-ALL 
cell lines, which have either WT FBW7 or mutant FBW7 (a deletion (Del) or an 
amino acid substitution). g, The human T-ALL cell lines DND41 and Loucy 
cells, which contain wild-type FBW7, were infected with the indicated lentiviral 
shRNA constructs and selected with 1 1g ml~’ puromycin to eliminate the 
non-infected cells. Cell lysates were collected for IB analysis. shFBW7, shRNA 


known as MAPK3) and/or ERK2 (also known as MAPK1) (Sup- 
plementary Fig. 5d-f). To investigate further the significance of each 
individual phosphorylation site, we created a panel of MCL1 mutants 


Cell line 


==] 18: ucu 
LSS] 8: vincutin 


Time after injection (days) 


specific for FBW7; shGFP, shRNA specific for the gene encoding green 
fluorescent protein (GFP). h, Human T-ALL cell lines deficient in FBW7 were 
infected with an FBW7-expressing retroviral construct (with empty vector (EV) 
as a negative control) and selected with 1 1g ml‘ puromycin to eliminate the 
non-infected cells. Cell lysates were collected for IB analysis. HA, 
haemagglutinin tag. i, IB analysis of the indicated primary human T-ALL 
clinical samples. j, IB analysis of the indicated mouse T-ALL cell lines derived 
from Terc— “Atm '~Tp53/— (TKO) mice. k-m, In vivo effects of MCL1 
depletion in FBW7-deficient T-ALL cells. An in vivo model of FBW7-deficient 
T-ALL was created by orthotopic engraftment of luciferase-expressing CMLT1 
cells in immunodeficient (NOD SCID I/2rg-null) mice. Mice were injected with 
1 X 10 cells (n = 7 per group) through the lateral tail vein. Before engraftment, 
cells were infected with retroviral constructs expressing the indicated shRNAs. 
k, Representative images of luciferase expression (photonic flux, in number of 
photons s 'cm * sr ! X 10°) detected in live mice, which had received 
CMLT1-shGFP (left) or CMLT1-shMCL1I (right). 1, IB analysis of the 
engineered CMLT1 cell lines, showing the efficient depletion of MCL1. 

m, Tumour burden was determined by quantification of total body 
luminescence and is expressed as photons s_' ROI’. Data are presented as 
mean ~ s.e.m., with statistical significance determined by Student’s t-test. 


(Fig. 2c). Using in vitro kinase assays, we identified $159 and T163 as 
the major GSK3-mediated phosphorylation sites’” and $121 as a minor 
GSK3-mediated phosphorylation site (Fig. 2d, e and Supplementary 
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Figure 2 | Phosphorylation of MCL1 by GSK3 triggers the interaction of 
MCLI with FBW7. a, In vivo MCL1 phosphorylation sites detected by mass 
spectrometry analysis. Phosphorylated residues are shown in red, with 
phosphate in blue. b, IB analysis, with antibodies specific for the indicated 
proteins (right), of HeLa cells transfected with siRNA oligonucleotides directed 
against the indicated genes (top) (where GSK3« indicates depletion of GSK3A 
and GSK3B with a single siRNA and GSK3a + GSK3 indicates depletion with 
siRNAs targeting each gene separately). c, Illustration of the various MCL1 
mutants generated for this study. Conserved serine and threonine residues 
within the degron sequence are shown in red, and conserved proline residues 
are shown in blue. 2A, MCL1 $159A/T163A; 3A, MCL1 S155A/S159A/T163A. 
d, e, GSK3 phosphorylates MCLI in vitro at multiple sites. Purified GSK3 
protein was incubated with 5 jig of the indicated glutathione S-transferase 
(GST)-MCLI fusion proteins (top, WT and mutant as in c) in the presence of 
[y-*P]ATP. The protein kinase reaction products were resolved by SDS- 


Fig. 5g). Inactivation of these GSK3-mediated phosphorylation sites 
impairs the interaction between MCLI and FBW7 both in vitro (Fig. 2f 
and Supplementary Fig. 5h) and in vivo (Fig. 2g and Supplementary 
Fig. 5i). Furthermore, pharmacological inhibition of GSK3 activity 
blocked the interaction between HA-tagged FBW7 and endogenous 
MCLI (Fig. 2h) and inhibited the localization of FBW7 to the mito- 
chondria, where MCLI resides (Supplementary Fig. 5j, k). These 
results indicate that GSK3-dependent phosphorylation of MCL is 
necessary for the interaction of MCL1 with FBW7. Consistent with 
this FBW7-MCLI regulatory axis, MCLI specifically interacts with 
FBW7 (Supplementary Fig. 6a, b, j-I) and CUL1 (Supplementary 
Fig. 6c, d), and depletion of endogenous CUL] increases MCL1 abund- 
ance (Supplementary Fig. 11a). 

We next explored the mechanism by which FBW7 alters MCLI1 
stability. Overexpression of FBW7 and GSK3 significantly decreased 
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PAGE, and phosphorylation was detected by autoradiography. 

f, Phosphorylation of MCL] at multiple sites by GSK3 triggers the interaction of 
MCLI with FBW7 in vitro. Autoradiograms show recovery of *°S-labelled 
FBW7 protein bound to the indicated GST-MCLI fusion proteins (with GST 
protein as a negative control) incubated with GSK3 before the pull-down 
assays. g, IB analysis of whole-cell lysates (WCL) and immunoprecipitates (IP) 
derived from 293T cells transfected with HA-FBW7 together with the 
indicated Myc-MCLI constructs (top). Thirty hours after transfection, cells 
were pretreated with 10 4M MG132 for 10h to block the proteasome pathway 
before cell collection. h, IB analysis of WCL and IP derived from 293T cells 
transfected with HA-FBW7. Thirty hours after transfection, cells were 
pretreated with 20 14M MG132 for 8h to block the proteasome pathway before 
cell collection. Where indicated, 25 uM GSK3B inhibitor VIII (with 
dimethylsulphoxide (DMSO) as a negative control) was added for 8 h before 
cell collection. 


MCLI abundance (Fig. 3a and Supplementary Fig. 6h), whereas inac- 
tivation of the major GSK3-dependent phosphorylation sites on 
MCL1I impaired FBW7-mediated destruction (Fig. 3b and Supplemen- 
tary Fig. 6e-g). All FBW7 isoforms (particularly the «-isoform and the 
y-isoform) participate in MCL1 stability control, and FBW7 dimeriza- 
tion is not required for the degradation of MCL1 (Supplementary Fig. 
7a-e). Mutant FBW7 constructs derived from patients with T-ALL 
showed a reduced ability to interact with MCLI (Supplementary Fig. 
6i) and were therefore unable to degrade MCLI (Fig. 3c). Moreover, 
the FBW7- and GSK3-mediated destruction of MCL1 was blocked by 
the proteasome inhibitor MG132, indicating the involvement of the 
ubiquitin—proteasome pathway in this process (Fig. 3a). In support of 
this idea, co-expression of GSK3 and FBW7 resulted in a marked 
reduction in the half-life of wild-type MCL1, but not of the 2A or 
3A MCLI1 mutants (Fig. 3d), with reduced interaction with FBW7 
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Figure 3 | FBW7 promotes MCLI ubiquitylation and destruction in a 
GSK3-mediated phosphorylation-dependent manner. a—c, GSK3-mediated 
phosphorylation-dependent degradation of MCL1 by FBW7. IB analysis of 
293T cells transfected with plasmids expressing the indicated Myc-MCL1 and 
HA-FBW7 proteins in the presence or absence of HA-GSK3 (top), with 
antibodies specific for the Myc tag, HA tag, GFP or tubulin (right). A plasmid 
encoding GFP was used as a negative control for transfection efficiency. Where 
indicated, the proteasome inhibitor MG132 was added. d, 293T cells were 
transfected with the indicated Myc-MCLI constructs together with the HA- 
FBW7- and HA-GSK3-expressing plasmids. Twenty hours after transfection, 
cells were split into 60-mm dishes. After another 20h, cells were treated with 
20 ug ml ' cycloheximide (CHX). At the indicated time points, WCL were 
prepared, and IB analysis was carried out with antibodies specific for the 


(Fig. 2g). Furthermore, loss of FBW7 extends the half-life of endogen- 
ous MCLI (Fig. 3e), and FBW7 promotes the ubiquitylation of MCL1 
in a GSK3-dependent manner (Fig. 3f and Supplementary Fig. 8a, b, e). 
The decrease of MCL1 expression is also impaired in response to 
various DNA-damaging agents!® in FBW7'~ DLD1 cells (Fig. 3g 
and Supplementary Fig. 8f). Together, these data suggest a physio- 
logical role for FBW7 in promoting MCL1 destruction in vivo in a 
GSK3-mediated phosphorylation-dependent manner. 

Next, we explored how FBW7 affects the cellular apoptotic response 
by modulating MCL1 abundance. As predicted, Fow7 ‘~ mouse thy- 
mocytes and FBW7-deficient human T-ALL cells with increased 
MCLI levels were less sensitive to apoptotic stimuli (Supplementary 
Fig. 9a—f). More interestingly, compared with T-ALL cell lines that had 
wild-type FBW7, FBW7-deficient T-ALL cells with elevated MCL1 
expression (Fig. 1f and Supplementary Fig. 9h) were more sensitive 
to the multi-kinase inhibitor sorafenib, which can effectively reduce 
MCLI expression’*” (Fig. 4a and Supplementary Fig. 9g-i). Although 
the ability of sorafenib to repress MCLI has been attributed to the 
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analysis was carried out with antibodies specific for the indicated proteins. 
Bottom, MCLI band intensity was normalized to tubulin and then normalized 
to the f = 0 controls. f, IB of WCL and His tag pull-down of HeLa cells 
transfected with plasmids expressing the indicated proteins. Twenty hours after 
transfection, cells were treated with the proteasome inhibitor MG132 for 12h 
before cell collection. His tag pull-down was performed in the presence of 8 M 
urea to eliminate any possible contamination from MCL1-associated proteins. 
Ni-NTA, nickel-nitrilotriacetic acid; Ub, ubiquitin. g, Top, IB analysis of WT 
and FBW7 ‘~ DLDI cells treated with 10 1M adriamycin (ADR) for the 
indicated time durations. Bottom, MCL1 band intensity was normalized to 
tubulin and then normalized to the t = 0 controls. 


inactivation of the RAF-ERK pathway and/or the activation of GSK3 
activity”, the exact mechanism remains unclear. Nonetheless, these 
data suggest that FBW7-deficient T-ALL cell lines might require ele- 
vated levels of MCL] to evade apoptosis, a phenotype known as “onco- 
gene addiction’. By contrast, FBW7-deficient T-ALL cell lines were 
more resistant to ABT-737 (Fig. 4a and Supplementary Fig. 9g, j). 
ABT-737 is a BH3 domain mimetic and a pan inhibitor of the BCL2 
family of anti-apoptotic proteins, and it is reported to kill leukaemia 
cells effectively”. However, leukaemia cells with elevated MCL] levels 
are refractory to treatment with ABT-737 (refs 23, 24), primarily 
because ABT-737 fails to inactivate MCLI1 (ref. 22). Experimental 
evidence from both double staining with 7-amino-actinomycin D 
(7-AAD) and annexin V (Supplementary Fig. 9j) and immunoblotting 
specific for apoptotic biomarkers (Fig. 4b) suggests that ABT-737- 
induced apoptosis is impaired in FBW7-deficient T-ALL cells. 
Moreover, specific depletion of MCL1 in multiple FBW7-deficient 
T-ALL cell lines restored the sensitivity of these cells to ABT-737 
(Fig. 4c, d), supporting the idea that increased MCL1 expression is 
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Figure 4 | Elevated MCLI expression protects FBW7-deficient T-ALL cell 
lines from ABT-737-induced apoptosis. a, Cell viability assays showing that 
FBW7-deficient human T-ALL cell lines were more sensitive to sorafenib but 
were relatively resistant to ABT-737 treatment. T-ALL cells were cultured in 
10% FBS-containing medium with the indicated concentrations of sorafenib or 
ABT-737 for 48 h before cell viability assays were performed. Data are shown as 
mean + s.d. for three independent experiments. b, IB analysis of the indicated 
human T-ALL cell lines with or without ABT-737 (0.8 1M) treatment. PARP, 
poly(ADP-ribose) polymerase. c, Specific depletion of endogenous MCL1 
expression restored sensitivity to ABT-737 in the indicated FBW7-deficient 
human T-ALL cell lines. Various T-ALL cell lines were infected with lentiviral 
shGFP- or shMCLI-encoding vectors and selected in 0.5 ,g ml ' puromycin to 
eliminate non-infected cells. The generated cell lines were cultured in 10% FBS- 
containing medium with the indicated concentrations of ABT-737 for 48h 
before cell viability assays were performed (right) or with or without ABT-737 
(0.8 1M) treatment for 24h before WCL were collected for IB analysis with 


the primary cause of desensitization to ABT-737 in vivo>***. It also 
suggests that patients with FBW7-deficient T-ALL will not respond 
well to treatment with ABT-737. We further demonstrated that mani- 
pulation of FBW7 activity or ectopic expression of a non-degradable 
form of MCL1 in human T-ALL cells affects their sensitivity to ABT- 
737 (Supplementary Fig. 10a, b) and responses to other apoptotic 
stimuli (Supplementary Fig. 10c-f). 
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antibodies specific for the indicated proteins (left). For cell viability assays, data 
are shown as mean + s.d. for three independent experiments. d, Double 
staining with 7-AAD and annexin-V-PE (annexin V conjugated to 
phycoerythrin), followed by flow cytometry analysis to detect the percentage of 
apoptotic cells (axes indicate intensity of fluorochrome). In the indicated 
FBW7-deficient human T-ALL cell lines, endogenous MCL1 was depleted by 
infection with lentiviral vectors encoding shRNA (lentiviral shGFP was used as 
a negative control). Cell lines were cultured in 10% FBS-containing medium 
with or without ABT-737 (0.8 1M) treatment, with DMSO as a negative 
control, for 48 h before the flow cytometry analysis. Purple numbers indicate 
the percentage of apoptotic cells. e, Staining and flow cytometry analysis as in 
d, demonstrating that sorafenib treatment restores ABT-737 sensitivity to 
FBW7-deficient HPB-ALL cells. HPB-ALL cells were cultured in 10% FBS- 
containing medium with the indicated concentrations of sorafenib and/or 
ABT-737 for 48 h before analysis. Coloured numbers indicate the percentage of 
apoptotic cells. 


Our results indicate that inhibition of MCL1 could be used to restore 
sensitivity to ABT-737 in FBW7-deficient T-ALL cells. Given that the 
clinical application of siRNA- or short hairpin RNA (shRNA)-mediated 
target extinction is not yet feasible owing to delivery challenges, we 
instead exploited small molecule strategies to reduce MCL1 expression, 
specifically with the use of sorafenib (Supplementary Fig. 9h). The 
combined use of sorafenib and ABT-737 produced a dose-dependent 
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increase in the sensitivity of HPB-ALL cells, a human T-ALL cell line, to 
ABT-737 (Supplementary Fig. 10g), and this decrease correlated with a 
significant increase in the induction of apoptosis (Fig. 4e). Similar 
results were obtained for other FBW7-deficient T-ALL cell lines 
(Supplementary Fig. 10h). 

Our studies provide experimental evidence of a role for FBW7 in 
governing the apoptotic pathway by controlling MCLI destruction. 
MCLI has a key role in regulating the apoptosis of T cells'* but not 
of cells from other tissue types, such as liver cells. Therefore, our studies 
also provide a possible mechanistic explanation for why loss of FBW7 is 
frequently seen in patients with T-ALL. Although other E3 ubiquitin 
ligases, including MULE” and -transducin-repeat-containing protein 
(B-TRCP)’’, have been implicated in controlling MCL1 stability, 
MULE activity was not implicated in the GSK3-dependent regulation 
of MCLI (refs 17, 25) (Supplementary Fig. 1la-e). Additionally, no 
correlation was found between MULE and MCL] expression in various 
T-ALL cells (Supplementary Fig. 11f), thereby excluding a physio- 
logical role for MULE in regulating MCL1 abundance in T-ALL cells. 
We further found that depletion of FBW7, but not B-TRCP, leads to a 
significant induction of MCLI expression (Fig. 1b and Supplementary 
Fig. 1 la—c). Array comparative genomic hybridization analysis demon- 
strated a high frequency of FBW7 loss’ but not simultaneous loss of 
BTRC1 and BTRC2, which encode 8-TRCPs, in T-ALL cells (data not 
shown). Together, these data support the hypothesis that SCF'?™’” is a 
physiological E3 ubiquitin ligase for MCL1, with USP9X being the 
nominated deubiquitylase**, and that loss of FBW7 contributes to 
T-ALL development through the upregulation of MCL1 expression. 
More importantly, our studies suggest that there is a correlation 
between FBW7 genetic status and sensitivity to ABT-737, and they 
provide insight into the use of MCLI inhibitors as a practical method 
for specifically killing FBW7-deficient T-ALL cells. This work provides 
a basis for the rational treatment of patients with T-ALL and provides 
motivation for the development of specific MCL1 antagonists, or agents 
that significantly reduce MCL1 expression, for the improved manage- 
ment of patients with T-ALL. 


METHODS SUMMARY 


Expression plasmid constructs, proteins, antibodies and cell lines are described in 
the Methods. The sequences of various siRNA oligonucleotides used in this study 
are also listed in the Methods. In vivo phosphorylation of MCL1 was detected by 
mass spectrometry analysis, and the major GSK3-dependent phosphorylation 
sites that were identified were subsequently examined by in vitro kinase assays. 
All mutants were generated using PCR, and the sequences were verified. FBW7- 
mediated MCL1 ubiquitylation and destruction were examined by cell-based 
ubiquitylation and degradation assays. Cell viability assays were used to detect 
the response of various T-ALL cell lines to sorafenib and ABT-737. Double stain- 
ing with annexin V and 7-AAD was used to detect the percentage of apoptotic 
cells. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Plasmids. HA-FBW7 and HA-GSK3 constructs were described previously®. 
Human FBW7 cDNA was subcloned using Pfu polymerase (Stratagene) into the 
pBabe-Puro-HA retrovirus vector. Myc-MCL1 WT, Myc-MCL1 3A, and GST- 
MCLI1 WT constructs were gifts from M.-C. Hung. FBW7 and MCL1 mutants 
were generated with the QuikChange XL Site-Directed Mutagenesis Kit 
(Stratagene) according to the manufacturer’s instructions. HA-ERK1, shERK1 
and shERK2 constructs were gifts from J. Blenis. Flag~B-TRCP1, Flag-Ub, 
shTRCP1 and shTRCP1+2 retroviral constructs were gifts from W. Harper. 
The shFBW7 retroviral vector (Addgene) was validated and described previ- 
ously’’. To generate the lentiviral shFBW7 and shMULE vectors, DNA oligonu- 
cleotides encoding shRNA directed against FBW7 and MULE were annealed and 
subcloned into Agel and EcoRI sites of the pLKO lentiviral plasmid. The following 
are DNA oligonucleotide sequences for the FBW7-directed shRNA (sense, 
5'-CCGGAACCTTCTCTGGAGAGAGAAACTCGAGTTTCTCTCTCCAGAG 
AAGGTTTTTTTG-3’; antisense, 5’-AATTCAAAAAAACCTTCTCTGGAGAG 
AGAAACTCGAGTTTCTCTCTCCAGAGAAGGTT-3’), and for MULE-directed 
shRNA (sense, 5'-CCGGAATTGCTATGTCTCTGGGACACTCGAGTGTCCCA 
GAGACATAGCAATTTTTTTG-3’; antisense, 5’-AATTCAAAAAAATTGCTA 
TGTCTCTGGGACACTCGAGTGTCCCAGAGACATAGCAATT-3’). Lentiviral 
shRNA constructs against GFP and MCL1 were obtained from W. Hahn. WT 
MCLI and 3A MCL1 cDNAs were amplified with PCR and subcloned into the 
BamHI and Sall sites of the pLenti-GFP-Puro construct (Addgene, catalogue 
number 658-5). 

Antibodies and reagents. Anti-Myc antibody (catalogue number sc-40), poly- 
clonal anti-HA antibody (SC-805), anti-cyclin A antibody (SC-751), anti-PLK1 
antibody (SC-17783), anti-CUL1 antibody (sc-70895), anti-RICTOR antibody 
(sc-81538), anti-p27 antibody (sc-528), anti-SKP1 antibody (sc-7163), anti- 
MCLI antibody (sc-819) and anti-cyclin E antibody (SC-247) were purchased 
from Santa Cruz Biotechnology. Anti-tubulin antibody (T-5168), polyclonal 
anti-Flag antibody (F2425), monoclonal anti-Flag antibody (F-3165), anti-pB- 
catenin antibody (C7207), anti-vinculin antibody (V9131), peroxidase-conjugated 
anti-mouse secondary antibody (A4416) and peroxidase-conjugated anti-rabbit 
secondary antibody (A4914) were purchased from Sigma. Anti-MCL1 antibody 
(4572), anti-BCL2 antibody (2872), anti-COX IV antibody (4850), anti-cleaved 
caspase 3 (Asp175) antibody (9661), anti-cleaved PARP (Asp214) antibody 
(9541), anti-ERK1/2 antibody (4695), anti-Jun antibody (9162), anti-phospho- 
GSK3 (Ser9) antibody (9336) and anti-BIM antibody (4582) were purchased 
from Cell Signaling Technology. Anti-MULE antibody (A300-486A) was pur- 
chased from Bethyl. Monoclonal anti-HA antibody (MMS-101P) was purchased 
from Covance. Anti-RBX1 antibody (RB-069P1) was purchased from NeoMarker. 
Another anti-MCL1 antibody (559027) was purchased from BD Pharmingen. 
Anti-GFP antibody (632380) and another anti-CUL1 antibody (32-2400) were 
purchased from Invitrogen. Anti-CDH1 antibody (CC43) was purchased from 
Oncogene. Oligofectamine, Lipofectamine and Plus reagents were purchased from 
Invitrogen. GSK3B inhibitor VIII was purchased from Calbiochem. 

siRNAs. Human siRNA oligonucleotides directed against FBW7, SKP2, CDH1 
and CUL1 have been described previously*”*”’. A human siRNA oligonucleotide 
that can deplete both B-TRCP1 and B-TRCP2 (sense, 5’-AAGUGGAAUUUGU 
GGAACAUC-3’) was purchased from Dharmacon. Human siRNA oligonucleo- 
tides directed against MULE (MULE-A: sense, 5’-CAUGCCGCAAUCCAGACA 
UAU-3')? and (MULE-B: sense, 5'-AAUUGCUAUGUCUCUGGGACA-3’)*° 
have been validated previously and were purchased from Dharmacon. Luciferase 
GL2 siRNA oligonucleotide was purchased from Dharmacon. siRNA oligonucleo- 
tides to deplete endogenous RBX1 (sense, 5'-AACUGUGCCAUCUGCAGGA 
ACAA-3'), CUL1 (sense, 5’-GGUCGCUUCAUAAACAACAUU-3’) and 
RICTOR (sense, 5’-AAACUUGUGAAGAAUCGUAUCUU-3’) were synthesized 
by Dharmacon. Cocktailed siRNAs targeting SKP1 were purchased from Invitrogen 
(1299003). A GSK3a-depleting siRNA oligonucleotide (6312) and a GSK3«/B- 
depleting siRNA oligonucleotide (6301) were purchased from Cell Signaling 
Technology. The GSK3f-depleting siRNA oligonucleotide (51012) was purchased 
from Ambion. As described previously, siRNA oligonucleotides were transfected 
into subconfluent cells with Oligofectamine or Lipofectamine 2000 (Invitrogen) 
according to the manufacturer’s instructions’. 

Cell culture. Cell culture including synchronization and transfection has been 
described previously*®. Wild-type and FBW7 ‘~ DLDI cell lines were gifts from 
B. Vogelstein. Mouse T-ALL cell lines derived from Tall-transgenic mice were gifts 
from M. A. Kelliher. Human T-ALL cell lines were previously described”. Loucy and 
CMLT1 T-ALL cell lines were obtained from J. Aster. For various assays described 
below, as indicated in the figure legends, T-ALL cells were cultured in either 0.5% 
FBS or 10% FBS-containing medium for sorafenib (ALEXIS Biochemicals) or ABT- 
737 (Symansis) treatment. In the case of combined treatment with both sorafenib 
and ABT-737, T-ALL cells were maintained in 10% FBS-containing medium. 


Lentiviral shRNA virus packaging, retrovirus packaging and subsequent infections 
were performed as described previously”*. For cell viability assays, cells were plated 
at 10,000 per well in 96-well plates, and incubated with the appropriate medium 
containing sorafenib, ABT-737 or DMSO for 48h. Assays were performed with 
CellTiter-Glo Luminescent Cell Viability Assay kit (Promega) according to the 
manufacturer’s instructions. For detection of apoptosis, cells treated with various 
drugs were stained with propidium iodide (Roche) or co-stained with annexin-V- 
PE and 7-AAD (Annexin V-PE Apoptosis Detection Kit I, BD Bioscience) accord- 
ing to the manufacturer’s instructions. Stained cells were sorted with a Dako- 
Cytomation MoFlo sorter (Dako) at the Dana-Farber Cancer Institute FACS core 
facility. 

Immunoblotting and immunoprecipitation. Cells were lysed in EBC buffer 
(50 mM Tris, pH 8.0, 120 mM NaCl and 0.5% NP-40) supplemented with protease 
inhibitors (Complete Mini, Roche) and phosphatase inhibitors (phosphatase 
inhibitor cocktail set I and II, Calbiochem). The protein concentrations of the 
lysates were measured using the Bradford Protein Assay reagent (Bio-Rad) on a 
DU 800 spectrophotometer (Beckman Coulter). The lysates were then resolved by 
SDS-PAGE and immunoblotted with the indicated antibodies. For immunopre- 
cipitation, 800 1g lysates were incubated with the appropriate antibody (1-2 pug) 
for 3-4h at 4 °C followed by 1h incubation with protein-A sepharose beads (GE 
Healthcare). Immuno-complexes were washed five times with NETN buffer 
(20 mM Tris, pH 8.0, 100 mM NaCl, 1 mM EDTA and 0.5% NP-40) before being 
resolved by SDS-PAGE and immunoblotted with the indicated antibodies. 
Quantification of the immunoblot band intensity was performed with Image] 
software. 

Detection of MCL1 phosphorylation sites in vivo. To map MCL phosphoryla- 
tion status in vivo, 293T cells were transfected with HA-MCL1 using the calcium 
phosphate method. Thirty hours after transfection, 293T cells were treated with 
104M MG132 for 16h to block the 26S proteasome pathway before collecting 
whole-cell lysates for HA-immunoprecipitation. After extensive washing with 
NETIN buffer, the HA-immunoprecipitates were separated by SDS-PAGE and 
visualized with colloidal Coomassie blue. The band containing MCL1 was excised 
and treated with dithiothreitol (DTT) to reduce disulphide bonds and iodoacetamide 
to derivatize cysteine residues. In-gel digestion of the protein was done using trypsin 
or chymotrypsin. The resultant peptides were extracted from the gel and analysed by 
nanoscale-microcapillary reversed phase liquid chromatography tandem mass spec- 
trometry (LC-MS/MS). Peptides were separated across a 37-min gradient ranging 
from 4% to 27% (v/v) acetonitrile in 0.1% (v/v) formic acid in a microcapillary 
(125 um X 18cm) column packed with Cjg reversed-phase material (Magic 
C18AQ, 5 kum particles, 200 A pore size, Michrom Bioresources) and online analysed 
on the LTQ Orbitrap XL hybrid FTMS (Thermo Scientific). For each cycle, one full 
MS scan acquired on the Orbitrap at high mass resolution was followed by ten MS/ 
MS spectra on the linear ion trap XL from the ten most abundant ions. MS/MS 
spectra were searched using the SEQUEST algorithm against a database that was 
created based on a protein sequence database containing the sequence for MCL1. 
They were searched for common contaminants, such as human keratin protein with 
static modification of cysteine carboxymethylation, dynamic modification of 
methionine oxidation and serine, threonine and tyrosine phosphorylation. All pep- 
tide matches were filtered based on mass deviation, tryptic state, XCorr and dCn and 
confirmed by manual validation. The reliability of site localization of phosphoryla- 
tion events was evaluated using the Ascore algorithm. 

Real-time RT-PCR analysis. RNA was extracted using the RNeasy mini kit 
(Qiagen), and the reverse transcription (RT) reaction was performed using 
TaqMan Reverse Transcription Reagents (ABI, N808-0234). After mixing the 
resultant template with MCL1 (Hs00172036_m1) or GAPDH (Hs99999905_m1) 
primers and TaqMan Fast Universal PCR Master Mix (ABI, 4352042), the real- 
time RT-PCR was performed with the 7500 Fast Real-time PCR system (ABI). 
FBW7 (Hs00217794_m1), SKP2 (Hs00180634_m1), BTRCI (Hs00182707_m1), 
MCL1 (Hs00172036_m1) and GAPDH (Hs99999905_m1) primers were pur- 
chased from ABI. 

Protein degradation analysis. Cells were transfected with Myc-MCL1 along with 
HA-FBW7 or Flag—B-TRCP1, and GFP as a negative control, in the presence or 
absence of HA-GSK3 and/or HA-ERK1. For half-life studies, cycloheximide 
(20 ug ml}; Sigma) was added to the media 40h after transfection. At various 
time points thereafter, cells were lysed, and protein abundances were measured by 
immunoblotting analysis. 

In vivo ubiquitylation assay. Cells were transfected with a plasmid encoding 
Flag-Ub along with Myc-MCL1 and HA-FBW7 in the presence or absence of 
HA-GSK3. Thirty-six hours after transfection, cells were treated with the protea- 
some inhibitor MG132 (30 11M; Calbiochem) for 6 h and then collected. Anti-Myc 
immunoprecipitates were recovered and immunoblotted with anti-Flag antibody. 
Alternatively, cells were transfected with His—Ub along with Myc-MCL1 and HA- 
FBW7 in the presence or absence of HA-GSK3. Thirty-six hours after transfection, 
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cells were collected, and the lysates were incubated with Ni-NTA matrices 
(Qiagen) at 4 °C for 12 hin the presence of 8 M urea, pH 7.5. Immobilized proteins 
were washed five times with 8 M urea, pH 6.3, before being resolved by SDS-PAGE 
and immunoblotted with anti-Myc antibody. 

In vitro ubiquitylation assay. The in vitro ubiquitylation assays were performed 
as described previously*. To purify the SCF’ ’” E3 ligase complex, 293T cells were 
transfected with vectors encoding GST-FBW7, HA-CULI, Myc-SKP1 and Flag- 
RBX1. The SCF'®” E3 complexes were purified from the whole-cell lysates using 
GST-agarose beads. Purified, recombinant GST-MCLI proteins were incubated 
with purified SCF’? complexes in the presence of purified, recombinant active 
E1, E2 (UBCH5A and UBCH3), ATP and ubiquitin. The reactions were stopped 
by the addition of 2X SDS-PAGE sample buffer, and the reaction products were 
resolved by SDS-PAGE gel and probed with the indicated antibodies. 

In vitro kinase assay. GSK3 was purchased from New England Biolabs. The in 
vitro kinase reaction was performed according to the manufacturer’s instructions. 
Briefly, 5 ug indicated GST fusion proteins were incubated with purified active 
GSK3 in the presence of 5 ,1Ci [y-**P] ATP and 200 1M cold ATP in the kinase 
reaction buffer for 20 min. The reaction was stopped by the addition of SDS- 
containing lysis buffer, the proteins resolved by SDS-PAGE and phosphorylation 
detected by autoradiography. 

MCL1-binding assays. Binding to immobilized GST proteins was performed as 
described previously**. Where indicated, the GST-MCL] proteins were incubated 
with GSK3 in the presence of ATP for 1h before the binding assays. 
Subcellular fractionation. Mitochondrial and cytosolic (S100) fractions were 
prepared by resuspending HeLa cells in 0.8 ml ice-cold buffer A (250 mM sucrose, 
20mM HEPES, pH 7.4, 10 mM KCl, 1.5mM MgCl, 1mM EDTA, 1 mM EGTA, 
1mM DTT, 17pgml~! phenylmethylsulphony! fluoride, 8 pg ml~' aprotinin, 
2g ml * leupeptin). Cells were then passed through an ice-cold cylinder cell 
homogenizer. Unlysed cells and nuclei were pelleted by a 10 min, 750g spin. 


LETTER 


The recovered supernatant was spun at 10,000 g for 25 min. This pellet was resus- 
pended in buffer A and represents the mitochondrial fraction. The supernatant 
was spun at 100,000 g for 1h. The supernatant from this final centrifugation 
represents the $100 (cytosolic) fraction. 

Mice. Generation of conditional Fbw7 knockout mice (Lck-Cre/Fbw7! 1 and 
Mx1-Cre/Fbw7"") was described previously*“. 

In vivo imaging. CMLT1 cells were infected with lentiviral vectors encoding a 
shRNA directed against MCL1 (shMCL1) or an irrelevant control (shGFP). After 
selection in 1:gml~' puromycin, cells were engineered for in vivo imaging by 
transduction with a retrovirus encoding a fusion of firefly luciferase fused to neo- 
mycin phosphotransferase and were then selected with 0.5mg ml ' G418. After 
selection, the luciferase activity of each engineered cell line was measured and found 
to have a similar reading. Subsequently, equal numbers of viable cells (0.5-1 X 10” 
cells) were injected into NOD SCID Il2rg-null mice through the lateral tail vein. 
Tumour burden was determined using bioluminescence imaging (IVIS Spectrum, 
Caliper Life Sciences) after intraperitoneal injection of 75 mg kg”! p-luciferin. Total 
body luminescence was quantified using the Living Image software package (Caliper 
Life Sciences) and is expressed as photons per second per standardized region of 
interest (photons s_' ROI '), encompassing the entire mouse. Data are presented as 
mean + s.e.m. with statistical significance determined by Student’s t-test. 
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Microtubules have pivotal roles in fundamental cellular processes 
and are targets of antitubulin chemotherapeutics'. Microtubule- 
targeted agents such as Taxol and vincristine are prescribed widely 
for various malignancies, including ovarian and breast adenocarci- 
nomas, non-small-cell lung cancer, leukaemias and lymphomas’. 
These agents arrest cells in mitosis and subsequently induce cell 
death through poorly defined mechanisms’. The strategies that res- 
istant tumour cells use to evade death induced by antitubulin agents 
are also unclear’. Here we show that the pro-survival protein MCL1 
(ref. 3) is a crucial regulator of apoptosis triggered by antitubulin 
chemotherapeutics. During mitotic arrest, MCL1 protein levels 
decline markedly, through a post-translational mechanism, poten- 
tiating cell death. Phosphorylation of MCL1 directs its interaction 
with the tumour-suppressor protein FBW7, which is the substrate- 
binding component of a ubiquitin ligase complex. The polyubiqui- 
tylation of MCL then targets it for proteasomal degradation. The 
degradation of MCL1 was blocked in patient-derived tumour cells 
that lacked FBW7 or had loss-of-function mutations in FBW7, 
conferring resistance to antitubulin agents and promoting chemo- 
therapeutic-induced polyploidy. Additionally, primary tumour 
samples were enriched for FBW7 inactivation and elevated MCL1 
levels, underscoring the prominent roles of these proteins in onco- 
genesis. Our findings suggest that profiling the FBW7 and MCL1 
status of tumours, in terms of protein levels, messenger RNA levels 
and genetic status, could be useful to predict the response of patients 
to antitubulin chemotherapeutics. 

BCL2 family proteins are key regulators of cell survival and can either 
promote or inhibit cell death’. Pro-survival members, including BCL- 
X, and MCLI, inhibit apoptosis by blocking the cell death mediators 
BAX and BAK (also known as BAK1). When uninhibited, BAX and 
BAK permeabilize the outer mitochondrial membranes, which releases 
pro-apoptotic factors that activate caspases, the proteases that catalyse 
cellular demise. This intrinsic, or mitochondrial, pathway is initiated by 
the damage-sensing BH3-only proteins, including BIM (encoded by 
BCL2L11) and NOXA (also known as PMAIP1), which neutralize the 
pro-survival family members when cells are irreparably damaged*. 

Because aberrant expression of pro-survival BCL2 family proteins 
promotes tumorigenesis and resistance to chemotherapeutics*, we 
evaluated whether these proteins regulate the cell death induced by 
antitubulin agents. Multiple lineages of Bax ‘~ Bak ‘~ mouse embry- 
onic fibroblasts (MEFs) were resistant to killing by Taxol or nocoda- 
zole, whereas wild-type (WT) MEFs were significantly more sensitive 
to such killing (Fig. la and Supplementary Fig. 2a—e). These results 
were confirmed in myeloid cells (Fig. 1b). As inhibitor of apoptosis 


(IAP) proteins’ do not havea significant role in the cellular response to 
antitubulin agents (Supplementary Fig. 3), we conclude that BCL2 
family proteins are key regulators of antitubulin-agent-induced cell 
death in diverse cell types. 

Next we determined the sensitivity of MEFs lacking individual BCL2 
family members to killing by Taxol or vincristine, two mechanistically 
distinct antitubulin chemotherapeutics. Belx ‘~ cells were more sensi- 
tive to Taxol than were WT cells, and Mcl1~’~ cells showed greater 
sensitivity than WT cells to Taxol or vincristine (Fig. 1c, d). Because the 
ratio of pro-survival to pro-apoptotic BCL2 family proteins dictates cell 
fate’, we monitored the levels of these proteins during mitotic arrest, as 
indicated by phosphorylation of the anaphase-promoting complex 
subunit CDC27 (ref. 6). MCLI1 protein levels declined markedly in 
synchronized cells released into nocodazole or Taxol (Fig. le and Sup- 
plementary Fig. 4). The decrease in NOXA protein levels is probably 
an indirect consequence of MCL1-regulated stability (D.C.S.H., unpub- 
lished observations). MCLI protein levels also declined in unsynchro- 
nized cells that were arrested in mitosis (Supplementary Figs 5 and 34). 

MCLI transcription was not significantly decreased during mitotic 
arrest in human cell lines (Fig. 2a). This implicated a role for the 
ubiquitin—proteasome system, the primary conduit for regulated pro- 
tein degradation in eukaryotic cells’, in the reduction of MCL1 protein 
levels. Indeed, the proteasome inhibitor MG132 blocked MCLI degra- 
dation (Fig. 2b and Supplementary Fig. 6), and endogenous MCL was 
ubiquitylated during mitotic arrest (Supplementary Fig. 7). 

MCLI contains potential degron motifs for association with the 
F-box proteins B-transducin-repeat-containing protein (B-TRCP; also 
known as FBXW1 or FWD1)* and FBW7 (also known as FBXW7, 
AGO, CDC4 or SEL10)° (Supplementary Fig. 8). F-box proteins are 
substrate receptors for SKP1-CUL1-F-box (SCF)-type ubiquitin ligase 
complexes, which mediate degradative polyubiquitylation®’®. Con- 
sistent with a role for CUL1-based ubiquitin ligases in MCL1 turnover, 
ectopic expression of a dominant-negative CUL1 protein blocked 
MCLI degradation during mitotic arrest (Supplementary Fig. 9). 
These data indicate that CUL1-containing ubiquitin-ligase complexes 
have a more prominent role in regulating MCL1 turnover during 
mitotic arrest than MULE, a ligase that ubiquitylates MCL1 (ref. 11), 
an idea corroborated by knocking down MULE expression in Taxol- 
treated cells by using RNA interference (RNAi) (Supplementary Fig. 
10a-c). RNAi-mediated knockdown of FBW7 expression, but not 
B-TRCP expression, attenuated MCL1 degradation in tumour cells 
(Fig. 2c and Supplementary Figs 11 and 12) and untransformed cells 
(Supplementary Fig. 13a, b). MCL1 degradation (Fig. 2d) and turnover 
(Supplementary Fig. 14) was protracted in FBW7-null cells relative to 
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Figure 1 | BCL2 family proteins regulate cell death induced by antitubulin 
chemotherapeutic agents. a—d, Viability of cell lines treated for 48 h with the 
indicated agents. Data are presented as the mean + s.e.m.; n = 3. ELA/RAS- 
transformed Bax /~ Bak~/~ MEFs (a) and factor-dependent myeloid (FDM) 
cells (b) are resistant to Taxol-induced cell death. c, Genetic deletion of Mcl1 or 
Belx enhances sensitivity to Taxol. d, Genetic deletion of Mcl1, but not of Belx, 
enhances sensitivity to vincristine. e, Assessment of BCL2 family protein levels, 
by western blotting (WB), during mitotic arrest. The mitotic time course 
indicates when synchronized cells were collected relative to the onset of mitotic 
arrest: that is, —2 denotes 2 h before mitosis (M), and +3 denotes 3 h after cells 
entered mitosis. CDC27 and tubulin are indicators of mitotic arrest and equal 
loading, respectively. CDC27-P, phosphorylated CDC27. 


WT cells, and complementation with FBW7 isoforms restored MCLI 
degradation (Fig. 2d and Supplementary Fig. 15). Endogenous MCL1 
was recruited to cellular SCF complex subunits in FBW7-WT but not 
FBW7-null cells during mitotic arrest (Fig. 2e). Recombinant MCL1 
was ubiquitylated in vitro by the reconstituted FBW7-containing SCF 
complex (SCF®’”) when the complete ligase complex was assembled 
(Fig. 2f). Collectively, these results demonstrate that SCF'?”” pro- 
motes MCL1 degradation during mitotic arrest. 

Because substrate phosphorylation promotes recruitment to FBW7 
(ref. 9), the phosphorylation status of candidate FBW7-binding degrons 
on MCLI was evaluated in cells arrested in mitosis (Fig. 3a). Mass 
spectrometry identified phosphorylation of residues S64, $121, $159 
and T163 (Fig. 3a and Supplementary Fig. 16a—d). Myc-tagged MCL1 
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Figure 2 | SCF™®” targets MCLI for proteasomal degradation during 
mitotic arrest. a-e, Human carcinoma cell lines were synchronized and 
collected throughout the mitotic time course as in Fig. 1a. During mitotic arrest, 
MCLI mRNA levels are not significantly decreased relative to MCLI protein, as 
determined by WB (numbers indicate molecular mass in kDa). MCL1 
expression was monitored by real-time PCR, and the percentage mRNA is 
indicated relative to the —4-h time point. b, MG132 stabilizes MCL1 
degradation during mitotic arrest in Heta cells. c, RNAi oligonucleotides 
targeting FBW7, but not control scrambled RNAi or RNAi oligonucleotides 
targeting BTRC (which encodes B-TRCP), attenuate MCL1 degradation during 
mitotic arrest in HCT 116 cells. d, MCL1 degradation is attenuated in 

FBW7 ‘~ HCT 116 cells during mitotic arrest. Complementation with the 
a-isoform or B-isoform of FBW7 restores MCL1 degradation. e, FBW7 recruits 
MCLI to the SCF ubiquitin ligase complex core, the components of which are 
CULI, SKP1 and ROCI, in HCT 116 cells in mitotic arrest. IP, 
immunoprecipitation. f, Left, reconstitution of the SCF'?” ubiquitin ligase 
complex promotes MCL1 ubiquitylation in vitro. Ubiquitylation reactions 
containing the indicated components were reacted in vitro with biotinylated 
ubiquitin. Reacted components were denatured, and Flag-MCL1 was 
immunoprecipitated (IP) and blotted (WB) for biotin to reveal in vitro- 
ubiquitylated MCL1 (MCL1-Ub). Myc-tagged F-box proteins (including 
F-box-deleted FBW7 (FBW7-AFBox)), Flag-MCL1 and HA-tagged CUL1 
variants were also immunoprecipitated and analysed as indicated by WB 
analysis to reveal the respective input levels. Wedges indicate an increasing 
amount of the indicated reaction component. Right, endogenous ROC1 does 
not associate with dominant-negative (DN) HA-tagged CULI. El, ubiquitin- 
activating enzyme; UBCHSA, E2 ubiquitin-conjugating enzyme. 
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Figure 3 | Identification of MCL1 degron motifs and protein kinases that 
direct recruitment to FBW7 during mitotic arrest. a, The FBW7 degron 
consensus sequence (top, with potential phosphorylation sites or phospho- 
mimic residues in red), corresponding MCLI residues (coloured, centre) and 
confirmed phosphorylation sites (P) during mitosis are indicated for three 
MCLI-derived peptide sequences. Phosphorylation at $159 (red) rather than 
$162 (orange) was confirmed by co-elution with a synthetic peptide (see 
Supplementary Fig. 16). 0, hydrophobic amino acid; X, any amino acid. The 
MCLI phospho-mutant nomenclature used is indicated. b, Association of 
Flag-FBW7 with Myc-MCL1 mutants $121A/E125A, $159A/T163A, and 4A 
is attenuated in mitotic arrest. The indicated constructs were expressed in HeLa 
cells that were synchronized, released into Taxol, and processed as indicated. 
c, MCL1 phospho-mutants $121 A/E125A, $159A/T163A and 4A have 
attenuated degradation during mitotic arrest. HCT 116 cells were synchronized 
and collected throughout the mitotic time course as in Fig. 1a. d, Schematic 
representation of MCLI1- or cyclin-E-derived peptides and their calculated 
dissociation constants (Kg), averaged from duplicate experiments 


was efficiently recruited to Flag-tagged FBW7 during mitotic arrest 
(Supplementary Fig. 17), and MCL1 residues 1-170 directed binding 
to FBW7 (Supplementary Fig. 18), thus mutant MCL1 constructs were 
tested to identify the degrons that confer FBW7 association (Fig. 3a). 
The MCL1 mutants $121A/E125A (in which the serine residue at posi- 
tion 121 and the glutamic acid residue at position 125 are both replaced 
by alanine residues) and $159A/T163A bound to FBW7 less efficiently 
than WT FBW7 (Fig. 3b), and their degradation during mitotic arrest 
was attenuated (Fig. 3c). Assessment of the relative affinities of the 
phosphorylated WT MCL1 degrons for FBW7 showed that the $121/ 
E125 site is a higher affinity degron than the $159/T163 site (Fig. 3d, e). 
Thus, similar to other FBW7 substrates such as cyclin E’, MCL1 con- 
tains high-affinity and low-affinity FBW7 degrons, both of which are 
required for efficient recruitment to (Fig. 3b) and subsequent degrada- 
tion by (Fig. 3c) SCF'™’” in the context of full-length MCL1. 

To investigate the protein kinase or kinases that direct MCL] recruit- 
ment to FBW7 in response to antitubulin chemotherapeutics, we 
focused on kinases that contain MCL1 degron consensus sites and 
demonstrate activity in mitotic arrest. This includes CDK1, casein 
kinase II (CKII), ERK isoforms (also known as MAPK1 and 
MAPK2), GSK3B, JNK isoforms (also known as MAPK8, MAPK9 
and MAPKI10) and p38 isoforms (also known as MAPK11, 
MAPK12, MAPK13 and MAPK14) (Supplementary Figs 19 and 
24c). Studies using protein kinase inhibitors (Supplementary Figs 
20a, 21, 22a, b and 24a, b) or RNAi (Supplementary Figs 20b, 23a—c 
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(mean + s.d.), for FBW7 binding as determined by ELISA. e, The MCLI1- 
derived peptide containing the phosphorylated $121/E125 degron (MCLI 
$121-P) preferentially binds to FBW7 in vitro. Graphical representation of the 
fraction of FBW7-bound cyclin E or MCL1 peptides as a function of peptide 
concentration is shown. DMSO, dimethylsulphoxide. f, Pharmacological 
inhibition of JNK, p38 or CDK1 (with inhibitor (and targeted kinase) indicated, 
top) attenuates recruitment of Myc-MCL1 to Flag—FBW7 during mitotic 
arrest. The indicated constructs were expressed in HeLa cells with or without 
CDC20 RNAi oligonucleotides or control scrambled RNAi oligonucleotides, 
and cells were then synchronized and released into Taxol. When cells entered 
mitotic arrest, the indicated agents were added for 1h followed by a 3-h 
incubation with 25 uM MG132 before collection and processing as indicated 
(see Supplementary Fig. 25). g, In vitro phosphorylation of recombinant MCL1 
drives FBW7 binding. Full-length MCL1 was subjected to in vitro 
phosphorylation with the indicated kinases and subsequently incubated with 
recombinant Flag-FBW7. Anti-Flag immunoprecipitates were resolved by 
SDS-PAGE and probed with antibodies specific for the indicated proteins. 


and 24a-c) indicated that the activities of JNK, p38, CKII and CDK1 
regulate MCLI1 degradation during mitotic arrest. Because CDK1 
inhibition drives cells out of mitosis’? (Supplementary Figs 21 and 
22a, b), non-degradable cyclin B1 was expressed, or CDC20 expression 
was knocked down, to maintain cells in mitotic arrest'’ (Supplementary 
Fig. 24a, b). Inhibition of JNK, p38 or CDK1 also attenuated MCL1 
recruitment to FBW7 (Fig. 3fand Supplementary Figs 25 and 26). JNK, 
p38 and CKII, but not CDK1, directly phosphorylated MCL1 degrons 
(Supplementary Table la-c). JNK and p38 directly promoted MCL1- 
FBW7 binding, whereas the contribution by CDK1 was negligible 
(Fig. 3g), suggesting that CDK1 indirectly enhances MCL1 phosphor- 
ylation to promote binding to FBW7 in the cellular context. Indeed, 
CDK1 phosphorylates T92 (Supplementary Table 1d), a residue that is 
phosphorylated (Supplementary Fig. 16e) and regulates MCL1 turn- 
over (Supplementary Fig. 27a) during mitotic arrest. 

Because the phosphatase inhibitor okadaic acid regulates MCL1 
phosphorylation in a manner similar to Taxol”, we evaluated whether 
CDK1-directed phosphorylation of T92 blocked the association of the 
okadaic-acid-sensitive phosphatase PP2A with MCL1 during mitotic 
arrest. PP2A more readily dissociated from WT MCLI than the T92A 
mutant, concomitant with increasing CDK1 activity (Supplementary 
Fig. 27b). MCL1-associated PP2A protein levels and phosphatase 
activity are low in mitotic arrest when CDK] activity is high, but they 
are restored after exit from mitosis, when CDK] is inactivated (Sup- 
plementary Fig. 27c). Thus, the phosphorylation of MCL1 degron 
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residues by JNK, p38 and CKII during mitotic arrest is probably 
initially opposed by phosphatases such as PP2A. Maximal activation 
of CDK1 in prolonged mitotic arrest promotes T92 phosphorylation 
and PP2A dissociation, allowing sufficient phosphorylation of MCL1 
degron residues to drive FBW7-mediated degradation (Supplemen- 
tary Fig. 1). These effects are revealed when microtubule-targeted 
agents are washed out of cells that are in mitotic arrest: the activities 
of JNK, p38 and CDK1 decline, and MCLI protein levels are restored 
(Supplementary Fig. 28). Sufficient loss of MCLI1 activates BAK and 
BAX (Supplementary Fig. 29) to promote apoptosis. 

FBW7 is a haploinsufficient tumour suppressor that targets proto- 
oncoproteins—including Myc, Jun, NOTCH and cyclin E—for degrada- 
tion’. FBW7 mutations that were identified in patient-derived cell lines 
disrupted the association of FBW7 with MCL1 during mitotic arrest 
(Supplementary Fig. 30). Thus, failure of inactivated FBW7 to promote 
MCLI degradation could confer resistance to antitubulin chemo- 
therapeutics. Indeed, FBW7-null cell lines showed attenuated MCLI1 
degradation and were more resistant to Taxol- or vincristine-induced 
cell death than were WT cells (Supplementary Figs 31 and 32). BCL-X;, 
remained stable regardless of FBW7 status (Supplementary Fig. 31). 

Similar trends were seen in patient-derived ovarian (Fig. 4a) and 
colon (Supplementary Fig. 33) cancer cell lines harbouring naturally 
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occurring FBW7 mutations. Although the response to antitubulin agents 
is heterogeneous within a cell population’®, mitotic arrest was similarly 
activated by Taxol treatment in synchronized and asynchronous ovarian 
cancer cell lines (Fig. 4a and Supplementary Fig. 34). Moreover, MCL1 
degradation profiles were similar in synchronized and asynchronous 
cells: MCL1 was efficiently degraded in FBW7-WT cells that are effec- 
tively arrested in mitosis, yet MCL1 persisted in TOV21G cells that 
undergo only transient mitotic arrest and in FBW7-mutant SKOV3 cells 
(Fig. 4a and Supplementary Fig. 34). Thus, the inappropriate survival of 
cells that are arrested in mitosis positively correlates with attenuated 
MCLI degradation, which is, in turn, regulated by FBW7. 

FBW7 with an R505L mutation was expressed in FBW7-WT 
TOV112D-X1 cells to mimic cells harbouring one mutated FBW7 
allele? and to assess the in vivo effects. Tumours expressing mutant 
FBW7 were more resistant to Taxol (Supplementary Fig. 35a) and had 
higher levels of MCL1 than FBW7-WT parental tumours (Supplemen- 
tary Fig. 35b, c). BCL-X, protein levels were unaffected by FBW7 status 
(Supplementary Fig. 35b, d). Reducing the amount of MCL] protein in 
FBW7-null cells restored their sensitivity to Taxol- and vincristine- 
induced death (Fig. 4b and Supplementary Fig. 36), demonstrating 
that MCL] is a crucial pro-survival factor that is responsible for res- 
istance to antitubulin agents in FBW7-deficient cells. 

Previous studies have shown that blocking apoptosis during mitotic 
arrest allows cells to exit mitosis and evade cell death’ and that FBW7- 
null cells more frequently exit mitosis and undergo endoreduplication 
to render cells polyploid’®. Our work identifying MCL1 as an FBW7 
substrate therefore suggests a molecular link to explain antitubulin 
agent resistance and chemotherapy-induced polyploidy. Indeed, 
FBW7-null cells exit Taxol- or vincristine-induced mitotic arrest more 
readily (Fig. 4d and Supplementary Figs 37 and 38) and show more 
pronounced polyploidy (Fig. 4c) than do FBW7-WT cells. Reducing 
the MCL] protein levels in the FBW7-null cells with short hairpin RNA 
(shRNA) decreased mitotic slippage, enhanced Taxol- or vincristine- 
induced apoptosis (Fig. 4d and Supplementary Figs 37 and 38) and 
reduced chemotherapeutic-induced polyploidy (Fig. 4c) compared 
with FBW7-null cells treated with control shRNA. Thus, MCLI1 pro- 
motes resistance to death induced by antitubulin chemotherapeutics 
and facilitates genomic instability when FBW7 is inactivated. 

The hostile tumour micro-environment, like chemotherapeutic 
insults, exerts selective pressures on malignant cells; therefore, tumour 


Figure 4 | FBW7 inactivation and increased MCLI levels promote 
antitubulin agent resistance and tumorigenesis in human cancers. a, FBW7- 
WT ovarian cancer cell lines that undergo mitotic arrest are sensitive to Taxol 
(left) and rapidly degrade MCL] relative to FBW7-mutant and Taxol-resistant 
cells (right). FBW7 status is specified in parentheses. b, Sensitivity to vincristine- 
induced cell death is restored in FBW7 ‘~ cells on MCLI ablation (red). WT or 
FBW7 ‘~ HCT 116 cells were transduced with the indicated doxycycline- 
inducible shRNA constructs, cultured in the presence of doxycycline, and treated 
with various concentrations of vincristine for 48 h before cell viability 
assessment. shLacZ, control shRNA (green and blue). Data are presented as 
mean + s.e.m.; n = 3. c, MCL1 expression modulates polyploidy in FBW7- 
deficient HCT 116 cells. WT or FBW7 ‘~ HCT 116 cells were transduced with 
the indicated doxycycline-inducible shRNA constructs, cultured in the presence 
of doxycycline, synchronized and released into vincristine. They were then 
collected at 5h (+5h) or 10h (+10h) after mitotic arrest and fixed, stained with 
propidium iodide and analysed by FACS (x axis, fluorescence units; y axis, 
number of cells). M1, percentage of cells with >2N DNA content. d, MCL1 
expression increases mitotic slippage and attenuates apoptosis in FBW7- 
deficient cells. WT or FBW7 ‘~ HCT 116 cells were transduced with the 
indicated doxycycline-inducible shRNA constructs, cultured in the presence of 
doxycycline, transduced with an H2B-GFP-expressing baculovirus 
synchronized, treated with the indicated antitubulin agents and imaged live. 
Three images were acquired every 10 min for 43 h, and 50 cells were analysed for 
each condition. *, P< 0.05; **, P< 0.001 (one-tailed Fisher’s exact test). 

e, MCL] levels are elevated in non-small-cell lung cancer (NSCLC) samples with 
mutant FBW7 or low FBW7 copy number relative to FBW7-WT tumours and 
normal lung samples (see also Supplementary Table 2). NSCLC FBW7-mutant 
samples 3 and 5 (green) also have low FBW7 copy number. 
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cells harbouring alterations in FBW7 and MCL] should be selected for 
and enriched in primary patient tumour samples. To this end, copy 
number analysis of FBW7 and MCL1 was performed in ovarian tumour 
samples (Supplementary Fig. 39). The co-occurrence of MCL1 gain and 
FBW7 loss was more frequent than expected, a finding that is consistent 
with selection for both genetic alterations (Supplementary Fig. 39). 
Data from non-small-cell lung cancer samples showed similar trends 
but were not statistically significant owing to insufficient sample size 
(data not shown). Immunoblotting of patient samples revealed that 
most tumours in which FBW7 was inactivated had increased MCL1 
protein levels relative to FBW7-WT tumours and normal lung samples 
(Fig. 4e and Supplementary Table 2). By contrast, BCL-X;, protein levels 
were not correlated with FBW7 status (Fig. 4e). Thus, functional FBW7 is 
required to downregulate MCLI expression in primary patient samples, a 
particularly significant finding given that antitubulin agents are thera- 
peutic mainstays for non-small-cell lung cancers and ovarian cancers. 

The signalling pathways that activate cell death induced by antitu- 
bulin chemotherapeutics are of crucial interest, and we provide genetic 
evidence that both MCL1 and BCLX are important regulators of this 
therapeutic response. Whereas BCL-X, is functionally inactivated by 
phosphorylation” and is unaffected by FBW7 status, MCL1 inactiva- 
tion is coordinated by the concerted activities of phosphatases, stress- 
activated and mitotic kinases, and the SCFPBW7” ubiquitin ligase. As 
such, we define a unique molecular mechanism for regulation of 
MCLI and initiation of apoptosis during mitotic arrest (Supplemen- 
tary Fig. 1). By identifying SCF'” as a crucial ubiquitin ligase that 
directs MCL1 degradation during mitotic arrest, we also elucidate a 
mechanism for resistance to antitubulin chemotherapeutics. Analysis of 
patient samples suggests that drug-efflux pumps"® or tubulin alterations” 
do not always account for resistance to antitubulin agents, thus evasion of 
apoptosis owing to inappropriately increased levels of MCL] is probably 
a crucial strategy. We also show that the elevated MCLI protein levels in 
FBW7-deficient cells favours increased mitotic slippage, endoreduplica- 
tion and subsequent polyploidy in response to antitubulin therapeutics. 
The role of MCL1 in FBW7-deficient cells therefore extends beyond 
the simple inhibition of apoptosis; it also facilitates genomic aberra- 
tions, thus fuelling the transformed state. 


METHODS SUMMARY 


The viability of cancer cell lines, and MEFs in which genes encoding IAPs had been 
knocked out, was analysed by using the CellTiter-Glo Luminescent Cell Viability 
Assay (Promega). Cells were treated in triplicate with antitubulin agents for the 
indicated times, using dimethylsulphoxide treatment as a control. The viability of 
BCL2-family-member-null MEFs was analysed by propidium iodide staining, as 
described previously”®, after treatment with antitubulin agents for 48 h. Cell syn- 
chronization was achieved by culture either in serum-free medium for 12-16h or 
in medium containing 2 mM thymidine for 18-24h, release from the thymidine 
block with three washes in PBS, followed by culture for 8-12 h in complete growth 
media (compositions are described in the Supplementary Information). Cells then 
underwent a second thymidine block for 16-20 h, three further washes in PBS and 
release into complete medium containing the indicated reagents. To block MCL1 
degradation, 25 uM MG132 was added as cells entered mitotic arrest, as assessed 
by visual inspection. See Supplementary Information for full methods. 
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X chromosome dosage compensation via enhanced 
transcriptional elongation in Drosophila 


Erica Larschan'***, Eric P. Bishop*>*, Peter V. Kharchenko*, Leighton J. Core®, John T. Lis°, Peter J. Park* & Mitzi I. Kuroda’? 


The evolution of sex chromosomes has resulted in numerous species 
in which females inherit two X chromosomes but males have a single 
X, thus requiring dosage compensation. MSL (Male-specific lethal) 
complex increases transcription on the single X chromosome of 
Drosophila males to equalize expression of X-linked genes between 
the sexes’. The biochemical mechanisms used for dosage compensa- 
tion must function over a wide dynamic range of transcription levels 
and differential expression patterns. It has been proposed’ that 
the MSL complex regulates transcriptional elongation to control 
dosage compensation, a model subsequently supported by mapping 
of the MSL complex and MSL-dependent histone 4 lysine 16 acet- 
ylation to the bodies of X-linked genes in males, with a bias towards 
3’ ends*’. However, experimental analysis of MSL function at the 
mechanistic level has been challenging owing to the small mag- 
nitude of the chromosome-wide effect and the lack of an in vitro 
system for biochemical analysis. Here we use global run-on sequen- 
cing (GRO-seq)* to examine the specific effect of the MSL complex 
on RNA Polymerase II (RNAP IT) on a genome-wide level. Results 
indicate that the MSL complex enhances transcription by facilitat- 
ing the progression of RNAP II across the bodies of active X-linked 
genes. Improving transcriptional output downstream of typical 
gene-specific controls may explain how dosage compensation can 
be imposed on the diverse set of genes along an entire chromosome. 

To investigate how the MSL complex specifically increases tran- 
scription of X-linked genes, we performed GRO-seq in SL2 cells, a 
male Drosophila cell line that has been extensively characterized for 
MSL function*’. To show the average enrichment across genes, a 3-kb 
‘metagene’ profile was plotted in which the internal regions were 
rescaled so that all genes appear to have the same length (Fig. 1). 
Analysis was restricted to expressed genes that were sufficiently large 
(>2.5kb) so that gene-body effects could be clearly assessed (822 
X-linked genes, 3,420 autosomal genes), and all gene profiles were 
normalized by their copy number as determined by analysis of SL2 
DNA content"®. High correlation coefficients were observed between 
replicate libraries (Pearson correlation coefficient, = 0.98; Supplemen- 
tary Fig. 1). The metagene profiles revealed a prominent 5’ peak of 
paused RNAP II consistent with previous chromatin immunoprecipi- 
tation (ChIP) and analysis of short 5’ RNAs'?? (RNA-seq). In addi- 
tion, a peak of RNAP II density downstream of the metagene 3’ 
processing site is evident, possibly due to slow release in regions of 
transcription termination®. The 3’ peak is present even when the influ- 
ence of neighbouring gene transcription is eliminated (Supplementary 
Fig. 2). 

The central question with regard to dosage compensation is how 
genes on the X chromosome differ on average from genes on auto- 
somes. Overall, we found that RNAP II density on active X-linked 
genes was higher than on autosomal genes, specifically over gene 
bodies (Fig. la). The increase in tag density over the bodies of 
X-linked genes compared to autosomal genes was approximately 


1.4-fold, consistent with previous estimates of MSL-dependent dosage 
compensation®!®'’. We also performed RNAP II ChIP in SL2 cells, 
confirming higher occupancy on X-linked genes compared to auto- 
somes but with lower resolution and reduced sensitivity (Supplemen- 
tary Fig. 3). Therefore, we proceeded with GRO-seq to analyse X and 
autosomal differences. 

To measure how X and autosomes differed on average in the dis- 
tribution of elongating RNAP II, we segmented genes into their 5’ 
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Figure 1 | The male X chromosome has higher levels of engaged RNAP II 
over gene bodies relative to autosomes. a, Average GRO-seq profiles of 
expressed genes are shown for X (red) and autosomes (blue). Read counts on all 
chromosomes were normalized to genomic read coverage to control for copy 
number variation, mappability and other potential biases. To construct a 
metagene profile, genes are scaled as follows: (1) the 5’ end (1 kb upstream of 
the transcription start site (TSS) to 500 bp downstream) and the 3’ end (500 bp 
upstream of the transcript termination site (TTS) to 1 kb downstream) were 
unscaled; (2) the remainder of the gene is scaled to 2 kb (see Supplementary 
Methods). b, PI values do not differ between X (red bar) and autosomal genes 
(blue bar). EdI values are significantly different between X (red bar) and 
autosomal genes (blue bar). Error bars represent a 95% confidence interval for 
the mean PI or EdI (1.96 X s.e.m.: m = 1,344 (X genes); n = 6,090 (autosomal 
genes)). The definitions of PI and EdI are shown in the schematic. The PI and 
EdI are calculated with unscaled GRO-seq tag counts. NS, not significant. 
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500 bp and the remainder of the coding region. We subdivided further 
the remainder of the coding region into 5’ and 3’ segments (25% and 
75%, respectively). Using this segmentation, we quantified RNAP II 
pausing and elongation separately on the basis of the unscaled GRO- 
seq signal (Fig. 1b). The pausing index (PI) was previously defined as 
the ratio of the GRO-seq signal at the 5’ peak to the average signal over 
gene bodies*. Here, we calculated the PI for X and autosomal genes as 
the ratio of the 5’ peak (segment A) to the first 25% of the remaining 
gene body (segment B), and found no statistically significant difference 
when the two groups were compared (Fig. 1b). 

To examine separately transcription elongation across gene bodies, 
we defined the elongation density index (EdI) as the ratio of tag density 
in the 3’ region of each gene (segment C) compared to its 5’ region 
after the first 500 bp (segment B). In contrast to our analysis of 5’ 
pausing, we found statistically significant differences in EdI (P value 
< 0.0162) between X and autosomes (Fig. 1b). This conclusion was 
robust to how the 5’ and 3’ regions of genes were divided (Supplemen- 
tary Table 1). As defined, the average PI (log scale) is a positive number 
because RNAP II is generally enriched at 5’ ends compared to gene 
bodies; the average EdI (log scale) is a negative number, as the relative 
density of RNAP II typically decreases from the beginning to the end of 
gene bodies. We conclude that X-linked genes, on average, show a 
significantly smaller decrease in RNAP II density along their gene 
bodies when compared to autosomal genes. 

To measure the specific contribution of the MSL complex to the 
increase in RNAP II within X-linked gene bodies, we used MSL2 RNA 
interference (RNAi) to reduce complex levels in male SL2 cells as 
described previously’. Excellent correlations between replicate data 
sets were observed (Supplementary Fig. 1). To confirm the X-specific 
effect of MSL2 RNAi, we computed the distributions of the GRO-seq 
signal (averaged over the bodies of genes excluding the 5’ peak) for all 
genes before and after RNAi. When comparing X versus autosomes, we 
found a preferential decrease on the X chromosome, with an average 
control:MSL RNAi ratio of 1.4 (Fig. 2a). MSL-dependent changes in 
average GRO-seq density showed a weak but statistically significant 
correlation with changes in steady-state messenger RNA levels assayed 
by expression array” (Pearson correlation = 0.22, Pvalue <1 x 10° *°) 
or mRNA-Seq’° (Pearson correlation = 0.30, P value <1 X10"). 
These results confirm that MSL-dependent changes in steady- 
state RNA levels reflect differences in active transcription on the X 
chromosome. 

In addition to assessing the average decrease of X-linked RNAP II 
density after MSL2 RNAi, we asked whether any genes showed strong 
MSL-dependence, a hallmark of the roX genes that encode RNA com- 
ponents of the complex’*"*. We found that roX2 showed a strong loss 
in GRO-seq density (ninefold) after MSL2 RNAi, as predicted (Fig. 2b 
and Supplementary Fig. 4). Interestingly, in the untreated or control 
RNAi samples, there is a prominent GRO-seq peak downstream of the 
major roX2 3' end, coincident with an MSL recruitment site (see 
discussion later). roX1 expression is low in this isolate of SL2 cells, 
and no other expressed genes on X or autosomes showed strong MSL 
dependence in our assays (>6-fold). Examples of additional individual 
gene profiles are shown in Supplementary Figs 5 and 6. 

Next we compared the average RNAP II density along X and auto- 
somal metagene profiles after control and MSL2 RNAi. Unlike our 
initial analysis of X and autosomes, where different gene populations 
were compared (Fig. 1), here we could examine the same genes in the 
presence and absence of the MSL complex (Fig. 3). We found that after 
MSL2 RNAi, the density of elongating RNAP II over the bodies of 
X-linked genes decreased, approaching the level on autosomes (Fig. 3 
and Supplementary Fig. 7). The presence of the MSL complex affected 
RNAP II density starting just downstream of the 5’ peak and continu- 
ing through the bodies of X-linked genes (Fig. 3 and Supplementary 
Fig. 7). Thus, GRO-seq functional data correlate with physical asso- 
ciation of the MSL complex, which is biased towards the 3’ ends of 
active genes on the male X chromosome*”. 
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Figure 2 | The MSL complex increases engaged RNAP II density on the male 
X chromosome. a, The log ratio of sense-strand reads in the MSL2 RNAi 
sample to the control RNAi sample was computed within the body of each gene. 
Here, the distributions of these ratios are plotted for all genes on X and 
autosomes. b, GRO-seq sense-strand read densities within the roX2 gene (x- 
axis denotes base pairs from the roX2 transcription start site) for the untreated, 
control RNAi and MSL2 RNAi samples. Schematic below GRO-seq profiles 
indicates the location of the DHS site, which contains sequences that can recruit 
the MSL complex to the X chromosome. 


To quantify the differences in density of engaged RNAP II in the 
presence and absence of the MSL complex, we calculated the PI and 
EdI for each gene, followed by the PI and EdI ratios comparing MSL2 
and control RNAi treatment. We found that both X and autosomes 
increased PI and decreased EdI after MSL2 RNAi treatment 
(Supplementary Fig. 8). However, in each case the change was larger 
on X than on autosomes, and the most profound difference was an 
MSL-dependent change in EdI on X compared with autosomes 
(P<1X 10 ?°; Fig. 3b). EdI was computed, as before, by defining 
the 5’ and 3’ regions as 25% and 75%, respectively, of the gene body 
after removing the 5’ peak, but the difference was statistically signifi- 
cant for all other values until the 3’ end was reached (Supplementary 
Table 1). When these analyses were performed separately for two 
independently prepared sets of GRO-seq libraries (Supplementary 
Fig. 9), the results were also statistically significant (P value 
<7.6X 10 ', P value < 1.1 X 10~* for each of two replicates). We 
conclude that the MSL complex causes the transcriptional elongation 
profiles of X-linked genes to differ from those of autosomal genes. 

To visualize the location along gene bodies at which the MSL com- 
plex functions, we calculated control: MSL2 RNAi GRO-seq ratios and 
generated a metagene profile (Fig. 4a). Here, values above zero represent 
higher relative amounts of engaged RNAP II in the presence of the MSL 
complex compared to after RNAi treatment. In contrast, values below 
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Figure 3 | The MSL complex facilitates the progression of engaged RNAP II 
across transcription units. a, Metagene profiles of expressed X chromosome 
genes and autosomal genes in control RNAi and MSL2 RNAi samples. Higher 
RNAP II density can be seen within the bodies of genes on the X chromosome 
(solid red) compared to those on autosomes (solid blue) in the control RNAi 
sample. After MSL2 RNAi, average RNAP II density on X decreases over gene 
bodies (dashed red) becoming similar to autosomal gene bodies (dashed blue). 
b, Ratios of PI between control and MSL2 RNAitreated cells are not significantly 
different for genes on the X chromosome (red bar) compared to those on 
autosomes (blue bar). In contrast, ratios of EdI between the control and MSL2 
RNAi sample decreased significantly for genes on the X (red bar) compared to 
those on the autosomes (blue bar). PI and EdI were calculated as described for 
Fig. 1. Error bars represent a 95% confidence interval for the mean PI or EdI 
ratios (1.96 X s.e.m.: n = 1,358 (X genes); n = 6,135 (autosomal genes)). 


zero represent a relative increase in engaged RNAP II after MSL2 RNAi. 
In the absence of the MSL complex, there is a relative increase in the 
amount of RNAP II localized to the 5’ ends of both autosomal and 
X-linked genes, perhaps due to relocalization of RNAP II from the 
bodies of X-linked genes (Fig. 4a). A limitation of the GRO-seq assay 
is that we cannot currently distinguish between initiating and 5’ paused 
polymerase, so we cannot assign a definitive role for this 5’ increase in 
RNAP II after MSL2 RNAi treatment. However, relative RNAP II levels 
over autosomal gene bodies do not increase, indicating that any relo- 
calized enzyme in this experiment is likely to remain paused rather than 
progressing across transcription units. This is consistent with a model 
in which the functional outcome of MSL2 RNAi is to shift RNAP II 
density away from productive transcription through X-linked gene 
bodies. 

We plotted the local effect of the MSL complex in Fig. 4a to compare it 
to the status of histone 4 lysine 16 (H4K16) acetylation (Fig. 4b) catalysed 
by the MOF component of the MSL complex*'®. H4K16 acetylation 
typically is enriched at the 5’ ends of most active genes in mammals 
and flies®!’; in contrast, a 3’ bias of this mark is a distinctive characteristic 
of the dosage compensated male X chromosome in Drosophila’*”’. 
Interestingly, there is an overall coincidence across gene bodies between 
the MSL-complex-dependent GRO-seq signal and the presence of 
H4K16 acetylation’ (Fig. 4a). How might H4K16 acetylation biased 
towards the 3’ end of genes generate the improved transcriptional elonga- 
tion indicated by our GRO-seq results? During transcription elonga- 
tion, nucleosomes are thought to comprise a barrier to the progress of 
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Figure 4 | MSL function correlates with the presence of H4K16 acetylation. 


a, The MSL2-dependent effect on RNAP II density as shown by metagene 
profiles of control: MSL2 RNAi GRO-seq sense-strand reads shown on log scale 
(base 2). The black line (y = 0) indicates no change after MSL2 RNAi treatment. 
The cumulative effect of MSL2 RNAi treatment peaks towards the 3’ ends of 
X-linked genes (red) while having less effect on autosomal genes (blue). 

b, Similar to the effect of the MSL complex on engaged RNAP II, H4K16 
acetylation on the male X chromosome localizes to the bodies of active genes 
with a 3’ bias (red). On autosomes, H4K16 acetylation is present at 5’ ends 
(blue) as described previously’. 


RNAP II'*”° and several well-studied elongation factors, including Spt6 
and the FACT complex, are proposed to function by removing nucleo- 
somes that block RNAP II progression and replacing them in the wake of 
transcription'*”’. Interestingly, H4K16 acetylation of nucleosomes has 
been observed to act in opposition to the formation of higher-order 
chromatin structure in vitro’. Thus, H4K16 acetylation is likely to 
reduce further the steric hindrance to RNAP II progression through 
chromatin. Improving the entry of RNAP II into the bodies of genes 
may allow 5’, gene-specific events to proceed at an increased but still 
regulated rate. Furthermore, reduction in the repressive effect of nucleo- 
somes could increase mRNA output by improving the processivity of 
RNAP II on each template. Available methodologies cannot distinguish 
between these mechanisms in vivo, and therefore future approaches will 
be required to assess their relative contributions to dosage compensation. 

In addition to increasing the transcription of X-linked genes for 
dosage compensation, the MSL complex also positively regulates the 
roX noncoding RNA components of the complex, to promote their 
male specificity'’*’*. roX1 expression is low in our SL2 cell line, but our 
GRO-seq data indicate that active transcription of roX2 is highly 
dependent on MSI2 as predicted (Fig. 2b and Supplementary Fig. 
4). Interestingly, there is a strong GRO-seq peak at the 3’ rox2 DHS 
(DNasel hypersensitive site), which contains sequences important for 
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targeting the MSL complex to the X chromosome. Sites of roX gene 
transcription are thought to be critical for MSL complex assembly**”’. 
Therefore, it is possible that paused RNAP II at the roX2 DHS could 
promote an open chromatin structure that facilitates MSL complex 
targeting or incorporation of noncoding roX2 RNA into the complex. 

Insummary, we propose that the MSL complex functions on the male X 
chromosome to promote progression and processivity of RNAP II 
through the nucleosomal template, as foreseen by Lucchesi*. Improving 
transcriptional output downstream of typical gene-specific regulation 
makes biological sense when compensating the diverse set of genes found 
along an entire chromosome. 


METHODS SUMMARY 


To measure the density of engaged RNAP II, GRO-Seq experiments were conducted 
on DRSC SL2 cells grown in Schneider’s medium with 10% FBS*. To determine how 
the MSL complex contributes to dosage compensation, MSL2 and control (GFP) 
RNAi treatments were conducted using a bathing protocol’. Nuclei were subjected 
to GRO-seq analysis after RNAi treatment. Two biological replicates were per- 
formed for the untreated, control RNAi and MSL2 RNAi experiments. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 
RNAi and cell culture methods. Control and MSL2 RNAi were performed in 
SL2-DRSC cells as described previously’. The control RNAi construct targeted the 
eGFP gene that is not present in SL2 cells, and the experimental RNAi construct 
targeted the MSL2 gene (http://www.flyrnai.org: DRSC 00829). Primer sequences 
for generation of the eGFP double-stranded RNA (dsRNA) template by PCR from 
pEGFP-N1(Clontech) were: forward, 5’-TAATACGACTCACTATAGGGAGA 
GGTGAGCA-AGGGCGA-GGAGCT-3’; and reverse, 5'-TAATACGACTCACT 
ATAGGGAGATCT-TGAAGTTCACCTTGATGC-CG-3’. The primers used for 
amplifying the MSL2 gene from Drosophila genomic DNA were: 5'-TAA 
TACGACTCACTATAGGGAGAGTTGGCTGTG-CTGGCTG-3’; and reverse, 
5'-TAATACGACTCACTATAGGGAGATGTTGGCTCGTCAC-TGTC-3’. 
dsRNA was synthesized from PCR products containing T7 promoters using the 
Ambion MEGAscript kit, and 225 1g of dsRNA was added to 2 10’ cells ina 
T225 flask. RNAi treatment was performed for 6 days after which mRNA was 
prepared and transcriptionally active nuclear extracts were generated as described 
later. mRNA preparation, complementary DNA synthesis and qPCR analysis of 
roX2 and msl2 RNA compared with the PKA normalization control were per- 
formed as described previously’. A 12.3-fold average decrease of msl2 mRNA was 
observed after MSL2 RNAi treatment when compared with the control treatment. 
Preparation of GRO-seq libraries for next-generation sequencing. Preparation 
of transcriptionally active nuclei from Drosophila SL2-DRSC cells after RNAi 
treatment was conducted as follows: SL2 cells grown in a T225 tissue culture flask 
were scraped and 1 X 10° cells were pelleted at 500g for 3 min at 4 °C. Then, cells 
were washed in 10 ml of cold PBS and spun at 500g for 3 min at 4 °C. Cells were 
swelled by resuspending gently in 10 ml ice-cold swelling buffer (10 mM Tris 
(pH = 7.5), 2mM MgCl, 3mM CaCl,) and placed on ice for 5 min. Next, cells 
were pelleted at 600g for 10 min at 4 °C. Pelleted cells were resuspended in 1 ml 
lysis buffer (10mM Tris (pH = 7.5), 2mM MgCl, 3mM CaCl, 10% glycerol, 
0.5% NP40, 2U ml ! SUPERaseIN (Invitrogen)) and pipetted 20 times with a 
P1000 tip with the end cut off. Nine millilitres of lysis buffer was added and nuclei 
were pelleted at 600g for 5 min. Nuclei were washed in 1 ml lysis buffer and then 
9 ml was added followed by pelleting for 5 min at 600g at 4 °C. A small aliquot was 
taken for Trypan blue staining to check that lysis occurred and nuclei were still 
intact. Next, nuclei were resuspended in 1 ml freezing buffer (50mM Tris-Cl 
(pH = 8.3), 40% glycerol, 5mM MgCl, 0.1mM EDTA) using a P1000 tip with 
the end cut off. Nuclei were pelleted for 1 min and resuspended in 500 ul of 
freezing buffer and aliquoted into 100 pl aliquots and frozen in liquid nitrogen. 
All solutions were prepared with DEPC-treated water. 

GRO-seq libraries were prepared as described previously* with the following 
changes: glycoblue (3 pl: 15 mgml~'; Ambion) was used in all of the ethanol 
precipitations to assure the release of the nascent RNAs from the interior surface 
of Eppendorf tubes; and wash buffers for BrU immunoprecipitation differ from 
those described in ref. 8 as follows. Firstly, high salt wash buffer for anti-BrdU 
(0.25 SSPE, 1mM EDTA, 0.05% Tween, 137.5mM NaCl; secondly, binding 
buffer for anti-BrdU (0.25 SSPE, 1 mM EDTA, 0.05% Tween, 37.5mM NaCl); 
thirdly, elution buffer (20 mM DTT, 300 mM NaCl, 50 mM Tris-Cl pH 7.5, 1 mM 
EDTA, 0.1% SDS); lastly, all immunoprecipitation wash buffers contain 
superRNAsin (1 pl per5 ml buffer) (Invitrogen) to block degradation that can 
occur during the immunoprecipitation process. 

Overview of computational analysis of GRO-seq data. For data generation and 
quality assessment, sequencing was performed on an Illumina Genome Analyser 
IIx. Two independent biological replicates were generated for each of the three 
experiments (untreated, control RNAi and MSL2 RNAi). Data are available from 
GEO under accession numbers GSE25321 and GSE25887. Reads were aligned to 
the D. melanogaster genome (dm3) using the Bowtie alignment software”. Only 
uniquely mapping reads with no more than one mismatch were retained. We 
obtained 10.6 million aligned reads from the untreated samples (7.1M from 
replicate I; 3.5M from replicate II), 25.2 million aligned reads from the control 
RNAi samples (20.5 M from replicate I, 4.7 M from replicate II), and 28.4 million 
from the MSL2 RNAi samples (22.4 M from replicate I, 6.0 M from replicate II). To 
assess the agreement between replicates, a correlation coefficient was computed 
between sense-strand read densities across genes in the two replicates for each of 
the three treatments. The agreement between replicates is excellent, with the 
following correlation coefficients: (1) untreated: Spearman, 0.97; Pearson, 0.98; 
(2) control RNAi: Spearman, 0.99; Pearson, 0.98; and (3) MSL2 RNAi: Spearman, 
0.99; Pearson, 0.98 (Supplementary Fig. 1). For most of the analysis, the two 
replicates were combined and processed together to increase statistical power. 
Key results were also confirmed in each replicate separately. 

Generating average profiles. To examine the difference between RNAi and con- 
trol as well as between X and autosomes, it was important to derive accurate 
‘metagene’ profiles. To improve existing TSS annotations, previously published 
small (<100 bp), capped nuclear RNA-seq data”’ were used. This data set contains 
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RNA isolated from 5’ ends of transcripts. Starting with FlyBase build 5.23, start 
sites for each annotated transcript were adjusted by up to 150 bp from the original 
location. The position within the 301 bp window centred on the existing TSS 
annotation with the highest number of reads from this capped nuclear RNA- 
seq data set was annotated as the new TSS for that transcript. In the event that 
two positions within the search space had the same number of reads, the most 5’ 
position was designated the TSS. Finally, transcripts with identical start sites were 
filtered out, ensuring each annotation is unique. 

To derive the metagene profile, we first computed the profile for each gene before 
computing the average. For each gene, the GRO-seq read profile on each strand was 
normalized to total sequencing depth and was smoothed using Gaussian smoothing 
with a bandwidth of 200 bp. To adjust for copy number variations, alignability and 
sequencing biases, the GRO-seq read density was further normalized by the ana- 
logous density of genomic sequencing reads'”. Specifically, each gene was divided 
into 200 bins and the log ratio (base 2) between GRO-seq and genomic sequencing 
read densities were computed for each bin. To avoid ratios becoming infinity when 
the denominator is zero, we applied the common technique of adding a pseudo- 
count (1 in this case) to both numerator and denominator. To average the log ratios 
across genes for the metagene profile, the 5’ end (1 kb upstream of the TSS to 500 bp 
downstream) and the 3’ end (500 bp upstream of the transcript termination site 
(TTS) to 1 kb downstream) were unscaled. The region within the gene body extend- 
ing from 500 bp downstream of the TSS and 500 bp upstream of the TTS was scaled 
to 2 kb (see Fig. 1a). 

Only genes longer than 2.5 kb were considered to avoid short genes in which the 
5’ peak is difficult to distinguish from the body of the gene. In addition, genes with 
less than one RPKM (reads per kilobase per million)/gene copy in the untreated 
GRO-seq sample were considered unexpressed and thus excluded. In a number of 
genes, the read distribution downstream of the 5’ peak contained high peaks, 
possibly due to unannotated internal TSS that distorted the average profiles. To 
mitigate the effect of these outliers, we removed 5% of the genes in which the 
highest density peak was downstream of the first 500 bp. These genes were not 
removed when computing P values or for other analyses. 

The ChIP-chip metagene profiles (Supplementary Fig. 3) were computed from 
array data by the same scaling method used for the GRO-seq metagene profiles. 
There was no need for further normalization in these profiles because we also 
normalized to array input, thereby controlling for copy number. 

Individual gene profiles (Fig. 2b and Supplementary Figs 4-6) were computed 
in a similar manner to the metagene profiles, only no scaling was performed and a 
100-bp sliding window was used to smooth the reads instead of Gaussian smooth- 
ing. As before, read density was normalized to total sequencing depth and for copy 
number using genomic sequencing reads as in the GRO-seq metagene profile 
calculations. 

The control/MSL2 RNAi log ratio metagene plot (Fig. 4a) was produced by 

taking the log ratio of the Gaussian-smoothed read densities in MSL2 RNAi and 
control samples across the body of each gene. The log ratios (base 2) were com- 
puted for each gene before scaling (with pseudocount of 1) and then averaged 
across genes (thus, this ratio is not simply the ratio of the profiles in Fig. 3a). 
Overall, higher values in Fig. 4a represent a greater drop in the GRO-seq signal 
after MSL2 RNAi treatment. 
Computing the PI and EdI. To compare the level of RNAP II at the 5’ ends of 
genes compared with that progressing into gene bodies, we defined a ‘pausing 
index’ (PI) as the ratio of 5’ GRO-seq read density within the first 500 bp down- 
stream of the TSS to the read density within the next 25% of the gene body. The 5’ 
read density is calculated as the number of sense-strand reads in the 5’ region 
divided by the number of uniquely mappable positions (as determined using 
PeakSeq**) in this same region. A position is ‘mappable’ if, given only the 36 bp 
sequence at that position, the position in the genome can be uniquely identified. 
Correcting for mappability in this manner prevents regions that have no reads 
because they are unmappable from biasing the analysis. A similar calculation is 
performed to determine the density in the next 25% of the gene. A high PI indicates 
that RNAP II is biased towards the 5’ end. 

To analyse the distribution of active RNAP II within a given gene, we calculated 
an ‘elongation density index’ (EdI) by taking the ratio of the 3’ read density to the 
5’ read density. The first 500 bp of the gene is excluded from this calculation to 
eliminate the effect of the large 5’ peak frequently associated with paused poly- 
merase. The remainder of the gene is then split into two portions, the 5’ region and 
the 3’ region. We state the main results with the 5’ region containing the first 25% 
of the gene (after the first 500 bp) and the 3’ region the remaining 75%, but 
multiple points of division were tested (Supplementary Table 1). The 3’ density 
is calculated as it was done above. A low EdI indicates that RNAP II is biased 
towards the 5’ end, whereas a larger value indicates greater RNAP II towards the 3’ 
end. 
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The gene set considered in the analysis of EdIs is similar to that used to produce 
the profile plots, except that no outliers were removed and only short genes less 
than 500 bp (instead of 2.5 kb) were excluded. These criteria were relaxed to make 
our analysis more conservative. To avoid outlier ratios that can result from a small 
number of reads, genes with fewer than 3 reads in the first 500 bp of the gene, the 5’ 
region or in the 3’ region were removed. A one-sided Wilcoxon test was used to 
test whether EdIs on the X chromosome are significantly greater than on auto- 
somes in the untreated sample. To compare the elongation density indices for the 
MSL2 RNAi with the control RNAi, the same procedure was followed, except that 
only genes with an EdI defined in both samples were considered. 

To determine whether removing outliers (as defined previously for the metagene 
profiles) alters our results, we compared EdI ratios (MSL2/control RNAi) with and 
without outlier removal. When outliers were removed, the shift in the distribution 
of Ed] ratios on X relative to autosomes remained significant (P value < 1X 10~ 1), 
Likewise, the difference between the EdI distribution on X relative to autosomes in 
the untreated sample remains significant after outlier removal (P value < 0.017 
before removal, P value < 0.020 after outlier removal). Overall, outlier removal has 
little effect on the statistical significance of our EdI comparisons. 

Comparing GRO-seq data with mRNA-seq. To compare our data with previous 
experiments” that measured the effect of MSL2 RNAi on expression levels, we 
examined GRO-seq read densities before and after treatment with MSL2 RNAi. 


Ratios of gene expression levels before and after MSL2 RNAi obtained by RNA- 
Seq experiments’? were compared to analogous GRO-seq ratios. GRO-seq ratios 
were computed only from reads mapping to the gene bodies. The region extending 
from the TSS to 500 bp downstream was excluded from these calculations so that 
the 5’ peak around the TSS would not bias the results. Read densities for each gene 
with at least 10 reads in both the MSL2 RNAi data set and the control RNAi data 
set were normalized to data-set size, and then a ratio was computed. The Pearson 
correlation coefficient between GRO-seq ratios and those derived from RNA-seq 
is highly significant (P value < 1X 10"), but with relatively low absolute mag- 
nitude (R = 0.30). If only X-linked genes are considered, the Pearson correlation 
remains unchanged (R = 0.30) and is still highly significant (P value < 1 X 10- +), 
When a similar comparison was performed between GRO-seq ratios and expres- 
sion array data’, a significant Pearson correlation of R= 0.22 was observed 
(P value <1 X10). 
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The RAG2 C terminus suppresses genomic instability 


and lymphomagenesis 


Ludovic Deriano!, Julie Chaumeil', Marc Coussens!, Asha Multani*, YiFan Chou', Alexander V. Alekseyenko’, Sandy Chane", 


Jane A. Skok’? & David B. Roth! 


Misrepair of DNA double-strand breaks produced by the V(D)J 
recombinase (the RAGI/RAG2 proteins) at immunoglobulin (Ig) 
and T cell receptor (Tcr) loci has been implicated in pathogenesis 
of lymphoid malignancies in humans’ and in mice” ’. Defects in DNA 
damage response factors such as ataxia telangiectasia mutated (ATM) 
protein and combined deficiencies in classical non-homologous end 
joining and p53 predispose to RAG-initiated genomic rearrange- 
ments and lymphomagenesis”"'. Although we showed previously 
that RAGI/RAG2 shepherd the broken DNA ends to classical non- 
homologous end joining for proper repair’”’’, roles for the RAG 
proteins in preserving genomic stability remain poorly defined. 
Here we show that the RAG2 carboxy (C) terminus, although dis- 
pensable for recombination’*”’, is critical for maintaining genomic 
stability. Thymocytes from ‘core Rag2 homozygotes (Rag2“ mice) 
show dramatic disruption of Tcra/d locus integrity. Furthermore, 
all Rag2“° p53~'~ mice, unlike RagI“° p53~/~ and p53" ‘~ animals, 
rapidly develop thymic lymphomas bearing complex chromosomal 
translocations, amplifications and deletions involving the Tcra/d and 
Igh loci. We also find these features in lymphomas from Atm™/~ 
mice. We show that, like ATM-deficiency’, core RAG2 severely desta- 
bilizes the RAG post-cleavage complex. These results reveal a novel 
genome guardian role for RAG2 and suggest that similar ‘end release/ 
end persistence’ mechanisms underlie genomic instability and 
lymphomagenesis in Rag2“° p53-/~ and Atm ‘~ mice. 

RAG mutations can cause specific defects in the joining stage of V(D)J 
recombination’*"*'*. The ‘dispensable’ RAG2 C terminus (murine 
amino acids 1-383) is of particular interest: loss of the RAG2 C terminus 
impairs joining of substrates’, increases levels of double-strand breaks’” 
that persist through the cell cycle’’, and increases accessibility of the 
broken DNA ends to alternative non-homologous end joining’*”. 
Despite these defects, Rag2“ mice are not lymphoma-prone. 

We reasoned that Rag2“* p53-'~ double-mutant mice might dis- 
play genomic instability and lymphomagenesis, even in the context of 
intact classical non-homologous end joining. Consistent with previous 
reports!°, our Rag2“ mice displayed partial developmental blocks in B 
and T lymphopoiesis because of a selective V-to-DJ rearrangement 
defect (Supplementary Fig. 1). Rag2“° animals, observed for up to 
lyear, showed no obvious signs of tumorigenesis (Fig. la and data 
not shown). As expected”’, approximately two-thirds of p53-/~ mice 
developed thymic lymphoma at an average age of approximately 
23 weeks (mean survival = 22.8 weeks) (Fig. la, b). Similar findings 
in RAG/p53-deficient mice’ demonstrate that RAG-initiated double- 
strand breaks are not critical initiators of lymphomagenesis in p53- 
deficient mice. In sharp contrast, 100% (n=25) of our Raga” 
p53! ~ mice died within 16 weeks (mean survival = 12.1 weeks) with 
aggressive thymic lymphomas (Fig. la—-c). Tumour cells were highly 
proliferative and expressed cell surface CD4 and CD8 (Supplementary 
Fig. 2), with little or no surface TCR (CD3¢ or TCRB) (data not shown), 
indicating that these tumours originate from immature thymocytes. 


Tumours with highly proliferating lymphoblasts were detected in 
4- to 6-week-old Rag2“° p53"'~ thymi, but not in other organs (data 
not shown), confirming their thymic origin. Rag2° p53-'~ tumours 
generally displayed one or a few predominant DB1-JB1 or DB2-JR2 
rearrangements, indicating a clonal or oligoclonal origin (Supplemen- 
tary Fig. 3). 

We next examined genomic stability in lymphomas from Rag2“ 
p53 ‘~ mice, first by analysis of Giemsa-stained metaphase spreads 
prepared from 12 Rag2“° p53-'~ and two p53‘ thymic lymphomas 
(Supplementary Table 1). Wild-type thymocytes showed almost no 
abnormal metaphases (0-3%) (Supplementary Table 1). In contrast, 
p53 '~ and Rag2“* p53 ‘~ tumours harboured a variety of cytogenetic 
aberrations (aberrant metaphases: 8-94%), including aneuploidy, chro- 
mosome breaks and chromosome fusions (Supplementary Table 1). We 
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Figure 1 | The C terminus of RAG2 is a tumour suppressor in developing 
thymocytes. a, Kaplan-Meier tumour-free survival analysis for cohorts of 
control (wild type, n = 12; Raga, n= 19), p53‘ (n = 32) and Raga 
p53-'~ (n= 25) mice. Animals were monitored for 50 weeks. The average age 
of death in weeks is shown for p53 /~ (22.8 weeks) and Rag2* p53 /~ 

(12.1 weeks) genotypes with the P value determined by a Wilcoxon rank sum 
test. b, Tumour spectrum observed for Rag2”* p53 '~ (n= 25) and p53 /~ 
mice (n = 27). All Rag2“* p53 '~ animals (n = 25) showed enlarged thymus. 
p53 '~ animals showed ee enlarged thymus and/or spleen (n = 18) or other 
non-lymphoid tumour mass (n = 9). ¢, P hysical a appearance of normal thymus 
(wild type) and thymic lymphoma (Rag2“ p53”, arrow) of 3-month-old 
animals. 
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analysed three Rag2“° p53‘ thymic lymphomas using spectral 
(1790T and 1745T) and G-band (1779T) karyotyping (Fig. 2). We 
observed recurrent translocations involving chromosomes that harbour 
Tcr (chromosomes 14 and 6) and Ig (chromosomes 12, 6 and 16) loci, 
suggesting that these might have been initiated by RAG-generated 
breaks. Moreover, all three lymphomas harboured translocations of 
the Igh locus-containing chromosome 12 and/or the Tcra/d locus- 
containing chromosome 14, loci that rearrange in thymocytes”. 
Analysis of lymphoma 1779T revealed a C12;14 translocation (Fig. 2). 
These results suggest that Rag2“° p53 '~ T cell tumours harbour clonal 
translocations involving the Tcra/d and Igh loci, as seen in T-cell lym- 
phomas from patients with ataxia-telangiectasia and Atm ‘~ 
mice’*"'™"", rearrangements not observed in p53 /~ lymphomas”!”’. 

To confirm the involvement of the Tcra/d locus in chromosome 
translocations, we performed DNA fluorescence in situ hybridization 
(DNA FISH) analyses on metaphases from Rag2“° p53-'~ thymic 
lymphomas (2489T and 2805T) using probes centromeric (Tcra/d 
V) and telomeric (Tcra/6 C) to the Tcra/6 locus plus a paint for chro- 
mosome 14 (Fig. 3a). In both tumours, breakpoints within the Tcra/d 
locus of one of the two chromosomes 14 resulted in amplification of 
the Tcra/d V region (Fig. 3a). The telomeric fragment (including Tcra/ 
6 C) was either translocated (2489T), or lost (2805T) (Fig. 3a). DNA 
FISH analysis of tumours 1790T and 1779T (from Fig. 2) using Tcra/6d 
Cand V probes also confirmed translocation of chromosome 14 with 
breakpoints within the Tcra/6 locus, although without obvious amp- 
lification (Supplementary Fig. 4). 

We next performed DNA FISH on Rag2“< p53" '~ thymic lympho- 
mas 2489T and 2805T using probes centromeric (Igh C) and telomeric 
(Igh V) to the Igh locus along with a chromosome 12 paint (Fig. 3b). In 
both lymphomas, one chromosome 12 showed translocation with 
another chromosome, with accompanying loss of both Igh C and V 
signals (Fig. 3b). This could result from RAG-induced breaks with loss 
of the telomeric end of the chromosome (including Igh V) and loss of 
the Igh C region by end degradation before fusion to the partner 
chromosome, as previously reported in Atm ‘~ mouse T cells’. 


Moreover, dual chromosome 12 and 14 paint analysis showed a 
C12;14 translocation in lymphoma 2489T (Fig. 3b). In contrast to 
Rag2“° p53-‘~ lymphomas, DNA FISH on metaphases from one 
p53 ‘~ thymic lymphoma (6960T) indicated that both Tcra/d and 
Igh loci were intact (Supplementary Fig. 5), consistent with previous 
work”. 

We next performed array-based comparative genomic hybridiza- 
tion (a~CGH) analysis on genomic DNA from five Rag2“° p53°'~ 
thymic lymphomas (2489T, 2805T, 1348T, 1779T, 1780T). We 
observed loss or gain of a region within the Tcra/6 and Igh loci, reflect- 
ing V(D)J recombination (Supplementary Fig. 6). All five Rag2” . 
p53 '~ lymphomas examined showed substantial amplification of a 
common region on chromosome 14, centromeric of the Tcra/d locus 
(Supplementary Fig. 6a), in agreement with our FISH analyses 
(Fig. 3a). We also observed loss of a common region on chromosome 
12, telomeric of the Igh locus in all five Rag2“° p53-‘~ thymic lym- 
phomas analysed (Supplementary Fig. 6b). Tumours 1779T, 2489T 
and 2805 also showed loss of a large region centromeric of the Igh 
locus, probably reflecting DNA-end degradation before fusion to the 
partner chromosome (Figs 2 and 3a, b and Supplementary Fig. 6b). In 
contrast, aCGH analysis of p53" ‘~ thymic lymphoma 6960T failed to 
reveal amplification centromeric to the Tcra/0 locus or deletion telo- 
meric to the [gh locus (Supplementary Fig. 7a, b), in agreement with 
our FISH analysis (Supplementary Fig. 5) and previous data”. 

Blocking lymphocyte development in early stages can lead to per- 
sistent RAG activity, which, in the absence of p53, can provoke lym- 
phomagenesis”’. To investigate whether the partial developmental 
block in Rag2“* thymocytes'’ is sufficient to produce genomic instab- 
ility and lymphomagenesis, we crossed core Rag] knock-in animals, 
which display diminished recombination and a strong block in B- and 
T-cell development’*™ (Supplementary Fig. 1), into a p53-deficient 
background. Ragl* p53’ mice survived at an average age of 
18.7 weeks (Supplementary Fig. 8a), barely distinguishable from 
p53‘ mice. Also like p53°'~ mice, only two-thirds of Ragl”“ 
p53 '~ mice developed thymic lymphomas (Supplementary Fig. 8b). 


Figure 2 | Rag2“° p53 /~ thymic lymphomas display recurrent 
translocations involving chromosomes that harbour antigen-receptor loci. 
Representative images of spectral karyotyping (1790T and 1745T) and G-band 
karyotyping (1779T) analysis of three Rag2* p53 ‘~ T cell lymphomas. 
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H § ? i if F] : a i Tumour number Genotype Translocations* Frequency 
1 a A “4 = 5 1790 Rag2°°;p53-- t(9;12) 7/8 
t(X;14) 4/8 
§ #38 84 Ce C8 
* 1745 Rag2°";p53- t(16;12) 3/9 
6 7 8 9 10 1(6:6) 3/9 
§ Be + § 5 be 0 a ‘ sé 1779 Rag2°";p53-- t(12;14) 14/17 
11.614 8 12 13 14 15 
64 «ft ag a6 } j 
16 17 18 19 x 
1779T 


Metaphase number analysed and translocations for each tumour sample are 
listed in the table. All three tumours harbour clonal translocations involving 
chromosomes that carry Tcr (chromosome 14, Tcra/d; chromosome 6, Tcrf) 
and/or Ig (chromosome 12, Igh; chromosome 6, Igi; chromosome 16, Ig/) loci. 
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Figure 3 | Rag2“° p53"'~ thymocytes display Tcra/d- and Igh-associated 
genomic instability. a, Top panel: schematic of the Tcra/d locus, with 
positions of the BACs used for generation of DNA FISH probes indicated. 
Bottom panels: representative metaphases from two Rag2° p53 '~ thymic 
lymphomas using the Tcra/d V BAC probe (red signal) combined with 
chromosome 14 paint (green signal, top row) or with the Tcra/d C BAC probe 
(green signal, bottom row). Arrows point to the amplification of the Tcra/d V 
region, arrowheads point to the translocated chromosome 14. b, Top panel: 
schematic of the Igh locus, with positions of the BACs used for generation of 
DNA FISH probes indicated. Bottom panels: representative metaphases from 
the same two Rag2”* p53 ‘~ thymic lymphomas using the Igh C BAC probe 
Furthermore, metaphase DNA FISH analyses on two Rag] p53 ‘~ 
thymic lymphomas (8383T and 8411T) (Supplementary Fig. 9) and 
aCGH analysis on genomic DNA from four Ragl“° p53"'~ thymic 
lymphomas (8315T, 8333T, 8383T, 8411T) (Supplementary Fig. 10) 
showed no evidence of recurrent translocations, genomic amplifica- 
tion or genomic deletion at chromosome 14 and chromosome 12. The 
genomic instability observed in Rag2”* p53 ‘~ thymic lymphomas is 
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Translocation 


(red signal) combined with chromosome 12 paint (green signal, top row) or 
with the Igh V BAC probe (green signal, bottom row). Combination of 
chromosome 12 (red) and chromosome 14 (green) paints is shown for both 
tumours in black boxes. Arrowheads point to the translocated chromosome 12. 
c, Examples of confocal sections of three-dimensional Tcra/d DNA FISH on 
freshly isolated wild-type (top row) or Rag2“° (bottom rows) double-positive 
thymocytes. Tcra/d V (green) and C (red) BAC probes were used. Scale bar, 
1pm. d, Representative experiment showing the frequency at which Tcra/é V 
and/or Tcra/é C signals are lost in wild-type (WT), p53 ‘~ and Raga“ 
thymocytes (n > 200; see Supplementary Fig. 11 for additional experiments 
and statistical analysis). 


therefore associated specifically with loss of the RAG2 C terminus, and 
does not result from the developmental block in core RAG2 homozygotes. 

We next asked whether core RAG2 promotes genomic instability in 
the presence of p53 by using three-dimensional interphase DNA FISH 
to examine the integrity of Tcra/d locus (Fig. 3c) in Rag2“° double- 
positive thymocytes. The two alleles appeared as two pairs of signals 
(Tcra/d V and Tcra/d C, mapping the two ends of the locus) in most 
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Figure 4| The C terminus of RAG2 stabilizes the RAG post-cleavage 
complex. a, Biochemical end-release assay. Purified glutathione S-transferase 
(GST)-tagged core RAG] and non-tagged RAG2 (full length or core) proteins 
(yellow circles) cleave a 500 base pair (bp) DNA substrate at 37 °C. Post- 
cleavage signal end complexes are thermally challenged at increasing 
temperatures to force the release of signal ends, which are detected after 
electrophoresis and gel staining. b, Representative gel for end-release assays. 


(>98%) wild-type and p53 ‘~ double-positive thymocytes (Fig. 3d 
and Supplementary Fig. 11), indicating that p53 deficiency alone does 
not disrupt the integrity of the Tcra/d locus, as expected”. In contrast, 
Rag2“° double-positive thymocytes displayed a three- to fivefold 
increase in the number of cells showing loss of at least one signal 
(Fig. 3c, d and Supplementary Fig. 11). These results suggest that core 
RAG2 promotes genomic instability at the Tcra/d locus, a phenotype 
similar to that previously reported in Atm ‘~ and 53bp1_‘~ animals”. 

We noted that both Rag2“* p53 ‘~ and Atm ‘~ mice feature RAG- 
dependent genomic instability at the Tcra/d and Igh loci, with develop- 
ment of pro-T cell lymphomas bearing clonal translocations, including 
12/14 translocations**”""". To determine whether Atm™‘~ thymic 
lymphomas also harbour amplification close to the Tcra/6 locus, we 
performed DNA FISH analysis for Tcra/d and chromosome 14 on 
metaphases from one Atm ‘~ thymic lymphoma (10375T) (Sup- 
plementary Fig. 12a). Both chromosomes 14 showed translocations 
with breakpoints within the Tcra/d locus, and amplification of the 
Tcra/d V region on one allele (Supplementary Fig. 12a), results that 
were confirmed by aCGH analysis (Supplementary Fig. 12b). We also 
observed loss of DNA ata distal region of chromosome 12, near the Igh 
locus (Supplementary Fig. 12b), as in Rag2“° p53~'~ lymphomas 
(Fig. 3 a, b and Supplementary Fig. 6). These data agree with recent 
analysis of thymic lymphomas from ATM-deficient mice’. Thymic 
lymphomas that arise in other mutant backgrounds such as p53, core 
RAG1/p53 (Supplementary Figs 8-10), EB/p53 or H2AX/p53 lack 
recurrent amplifications of chromosome 14 regions and/or recurrent 
chromosome 12/14 translocations, and thus appear to arise from dis- 
tinct mechanisms. 

Our data reveal a novel in vivo function for the RAG2 C terminus in 
promoting genomic stability. How does core RAG2 allow genomic 
instability? We hypothesized that core RAG2, like the absence of 
ATM?, destabilizes the post-cleavage complex. To investigate this, 
we generated RAG-signal end complexes by in vitro cleavage and 
challenged them at increasing temperatures, followed by gel electro- 
phoresis (Fig. 4). Complexes containing full-length RAG2 did not 
release 50% of signal ends until 55 °C (Fig. 4b, c), as expected’*”*. In 
contrast, core RAG2-containing complexes displayed statistically sig- 
nificant instability at lower temperatures, with 50% end release at 37 °C 
(Fig. 4b, c). To examine the post-cleavage complex in vivo, we analysed 
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Numbers above each lane indicate the temperatures (in degrees Celsius) the 
reactions were heated to before electrophoresis. CE, coding ends; SC, single 
cleavages; PK, samples treated with proteinase K and SDS. c, Quantification of 
signal end release, measured as the combined amount of signal ends divided by 
the signal from the total amount of DNA in the lane, from six experiments using 
two different protein preparations (*P < 0.05, Student’s t-test). 


inversional recombination, which requires coordination of all four 
DNA ends. Decreased inversional recombination and increased 
formation of hybrid joints (generated by joining of a coding end to a 
signal end, in this case revealing defects in formation of four-ended 
inversion products) has been reported in ATM- and MRE11 complex- 
deficient cells*’””*. As expected*”*, we observed increased hybrid joint 
formation at the Igk locus (Vk6-23 to Ji1) in Atm ‘~ and Nbs*8/48 
splenocytes (Supplementary Fig. 13). Importantly, we observed 
increased Vx6-23-to-Jk1 hybrid joints in Rag2“° splenocytes, com- 
pared with their wild-type and Rag2”* counterparts (Supplemen- 
tary Fig. 13). These results are supported by the observation that 
Rag2“° lymphocytes exhibit defects in inversional recombination”. 
Together, these data support our hypothesis that core RAG2 impairs 
the stability of the RAG post-cleavage complex in vitro and in vivo. 

Our data support a common model for genomic instability in 
Rag2“ p53 ‘~ and Atm '~ mice: premature release of RAG-generated 
double-strand breaks from the RAG post-cleavage complex allows ends 
to escape the normal joining mechanisms, to persist and to be poten- 
tially joined by alternative non-homologous end joining, a pathway 
permissive for chromosome translocations and amplification*”. Both 
end release and end persistence are promoted by ATM deficiency””, 
probably because ATM both stabilizes the RAG post-cleavage com- 
plex? and activates p53-dependent checkpoints/apoptosis. In Rag2° 
p53 '~ mice, end persistence might be augmented by ongoing RAG 
activity through the cell cycle resulting from impaired degradation of 
core RAG2, which lacks the cell-cycle-regulated degradation motif"*”*. 

The complete penetrance, rapid development of lymphoma and 
extraordinary degree of RAG-mediated genomic instability make 
Rag2“* p53"'~ mice an attractive model for investigating the spectrum 
of somatic genome rearrangements underlying lymphomagenesis. 


METHODS SUMMARY 

Mice. Mice were bred in the New York University Specific Pathogen Free facility; 
animal care was approved by the NYU SoM Animal Care and Use Committee 
(protocol number 090308-2). 

Analysis of tumour cells. Lymphoid tumours were analysed by flow cytometry 
with antibodies against surface B- and T-cell markers. Metaphases were prepared 
and analysed as described in Methods. 

FISH and image analysis. DNA FISH was performed using BAC probes as 
described in Methods. Interphase FISH was performed on double-positive 
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thymocytes isolated by cell sorting according to protocols described in Methods. 
Images were obtained by confocal microscopy on a Leica SP5 AOBS system, with 
optical sections separated by 0.3 jum. Images were analysed using Image J software. 
Metaphase spreads were imaged by fluorescent microscopy on a Zeiss Imager Z2 
Metasystems METAFER 3.8 system and analysed using ISIS software. Statistical 
analysis of image parameters used a two-tailed Fisher’s exact test. 

Biochemical end-release assay. The stability of RAG-signal end complexes was 
measured as described in Methods. Briefly, RAG cleavage reactions were divided 
into aliquots in microfuge tubes and incubated at the indicated temperatures for 
30 min, followed by polyacrylamide gel electrophoresis. DNA was stained using 
SYBR Safe DNA Gel Stain (Invitrogen) and quantified with Quantity One software 
(Biorad). Student’s t-test assuming equal variance was used to calculate statistical 
significance. 

aCGH analysis. For CGH, genomic DNA from mouse thymic lymphomas was 
profiled against matched thymic DNA from wild-type mice. aCGH experiments 
were performed on two-colour Agilent 244A Mouse Genome Microarrays. Data 
analysis was performed as described in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. We obtained wild type (Taconic), Rag2" ar Ragl ae p53! (Jackson 
laboratory”’) and Atm '~ (Jackson laboratory'') mice for this study. Rag2” ° or 
Rag1“ mice were bred with p53-deficient mice to generate doubly deficient mice. 
Genotyping of these mutants was performed by PCR of tail DNA as described in 
the relevant references'”’°??**, 

Characterization of tumour cells and metaphase preparation. Lymphoid 
tumours were analysed by flow cytometry with antibodies against surface B-cell 
(CD43, B220, IgM) and T-cell (CD4, CD8, CD3, TCR-B) markers. FACS analysis 
used a BD LSRII flow cytometer (BD Biosciences) equipped with FacsDiVa and 
FlowJo. For metaphase preparation, tumour cells were prepared as previously 
described***. Briefly, primary tumour cells were grown in complete RPMI media 
for 4h and exposed to colcemid (0.04 ug ml ', GIBCO, KaryoMAX Colcemid 
Solution) for 2hours at 37°C. Then, cells were incubated in KCl 75mM for 
15 min at 37 °C, fixed in fixative solution (75% methanol/25% acetic acid) and 
washed three times in the fixative. Cell suspension was dropped onto pre-chilled 
glass slides and air-dried for further analysis. 

G-banding and spectral karyotyping. Optimally aged slides were treated for the 
induction of G-banding following the routine procedure”. Spectral karyotyping 
was performed using the mouse chromosome SKY probe Applied Spectral 
Imaging according to the manufacturer’s instructions to determine chromosomal 
rearrangements in the tumour samples. The slides were analysed using a Nikon 
Eclipse 80i microscope. G-banding as well as SKY images were captured and 
karyotyped using an Applied Spectral Imaging system. 

DNA FISH probes. BAC probes for the Igh and Tcra/6 loci were labelled by nick- 
translation and prepared as previously described**”*. For the Igh locus, BAC 199 
(Igh C) and BAC RP24-386J17 (Igh V) were labelled in Alexa Fluor 594 and 488 
respectively (Molecular Probes). For the Tcra/d locus, BAC RP23-304L21 (Tcra/d 
V) and RP-23 255N13 (Tcra/d C) were labelled in Alexa Fluor 488 or 594. 
StarFISH-concentrated mouse FITC or Cy3 chromosome 12 or 14 paints were pre- 
pared following supplier’s instructions (Cambio). BAC probes were re-suspended in 
hybridization buffer (10% dextran sulphate, 5X Denharts solution, 50% formamide) 
or in paint hyb buffer, denatured for 5 min at 95 °C and pre-annealed for 45 min at 
37 °C before hybridization on cells. 

DNA FISH on metaphase spreads. Slides were dehydrated in ethanol series, 
denatured in 70% formamide / 2X SSC (pH 7-7.4) for 1 min 30s at 75°C, dehy- 
drated again in cold ethanol series, and hybridized with probes o/n at 37 °C ina 
humid chamber. Slides were then washed twice in 50% formamide / 2 SSC and 
twice in 2X SSC for 5 min at 37 °C each. Finally, cells were mounted in ProLong 
Gold (Invitrogen) containing 4’,6-diamidino-2-phenylindole (DAPI) to counter- 
stain total DNA. 

DNA FISH on interphase nuclei. Double-positive thymocytes were isolated from 
total thymi on a Beckman-Coulter MoFlo cell sorter as Thy1.2*CD4*CD8° cells 
using the following antibodies: PE-Cy7-coupled anti-CD90.2 (Thy1.2; 53-2.1), 
APC-coupled anti-CD4 (L3T4) and FITC-coupled anti-CD8 (53-6.7). Cells were 
washed two times in 1X PBS and dropped onto poly-t-lysine-coated coverslips. 


For three-dimensional DNA FISH analyses, we used a protocol for immunofluor- 
escence / DNA FISH previously described**”», with protein detection step omitted. 
Briefly, cells were fixed in 2% paraformaldehyde / 1 PBS for 10 min at room 
temperature, permeabilized in 0.4% Triton / 1X PBS for 5 min on ice, incubated 
with 0.01 mg ml’ Rnase A for 1 h at 37 °C and permeabilized again in 0.7% Triton 
/0.1M HCl for 10 min on ice. Cells were then denatured in 1.9 M HCl for 30 min at 
room temperature, rinsed in cold 1X PBS and hybridized overnight with probes at 
37 °C ina humid chamber. Cells were then rinsed in 2X SSC at 37 °C, 2* SCC at 
room temperature and 1X SSC at RT, 30 min each. Finally, cells were mounted in 
ProLong Gold (Invitrogen) containing DAPI to counterstain total DNA. 
Biochemical end-release assay. End-release assay to measure the stability of the 
signal-end complexes was performed as previously described’®. For RAG- 
mediated cleavage, 100ng of recombination substrate (PCR product from 
pJH289) was incubated for 3h at 37°C with 200 ng purified RAG protein and 
200 ng of purified recombinant HMGB1 in a buffer containing 50 mM HEPES 
(pH 8.0), 25mM KCl, 4mM NaCl, 1mM DTT, 0.1 mg BSA, 5mM CaCl, and 
5mM MgCl. Reactions were then divided into aliquots in microfuge tubes and 
incubated at different temperatures, or treated with stop buffer (10 mM Tris (pH 
8.0), 10mM EDTA, 0.2% SDS, 0.35 mg ml | proteinase K (Sigma Aldrich)) for 
30min and then run out on 4-20% acrylamide tris-borate-EDTA (TBE) gels 
(Invitrogen). 

aCGH analysis. aCGH experiments were performed on two-colour Agilent 244A 
Mouse Genome Microarray. After internal Agilent quality control, the collected 
data were background subtracted and normalized using the Loess method*®. We 
used circular binary segmentation method to define regions of copy number 
alteration compared with the control*’ and applied the cghMCR method for 
extraction of altered minimum common regions between the samples**. The 
analyses and visualizations were performed using the R statistical program”. 
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doi:10.1038/nature09804 


Growth of graphene from solid 


carbon sources 


Zhengzong Sun, Zheng Yan, Jun Yao, Elvira Beitler, Yu Zhu 
& James M. Tour 


Nature 468, 549-552 (2010) 


In this Letter, the assigned values for melamine XPS signals of 
284.5 eV (C 1s) and 395.8 eV (N 1s) were incorrect (Supplementary 
Fig. 7). Repeating the melamine spectrum with a graphite additive 
standard (C 1s = 284.5 eV) showed that the assignments for melamine 
should be 286.8 eV (Cls) with N 1s in the ring at 397.8 eV and N Is 
external to the ring at 398.7 eV. Therefore, the monolayer N-doped 
graphene on SiO; substrates having signals of 284.5 eV (C 1s) and 399.8 
eV (N Is) remains clearly distinguished from the melamine starting 
material. Furthermore, the Li et al.' (ref. 29 in this Letter) assign- 
ment for quaternary N was misquoted. Although high-temperature 
(1,000 °C) growth favours quaternary N in the N-doped graphene film 
(this is supported by ref. 1), our assignment of 399.8 eV (N 1s) for 
monolayer N-doped graphene is different from the Li et al.’ assign- 
ment, because their assignment of 401 eV (N 1s) is for thick N-doped 
graphene films. In our experiments, when we used. poly(ethylene 
imine) as a growth source, multilayer N-doped graphene was attained 
and similarly gave 401.2 eV (N 1s) as the assignment. Therefore, either 
there is a difference between the thick film assignments of Li et al.' and 
our assignment of monolayer N-doped graphene on SiO, substrates, or 
the N signals in our material are more pyridinic and/or pyrrolic in 
content’. We thank T. Susi for bringing this to our attention. 


1. Li, X. etal. Simultaneous nitrogen doping and reduction of graphene oxide. 
J. Am. Chem. Soc. 131, 15939-15944 (2009). 

2. Pels, J. R., Kapteijn, F., Moulijn, J.A., Zhu, Q. & Thomas, K. M. Evolution of nitrogen 
functionalities in carbonaceous materials during pyrolysis. Carbon 33, 
1641-1653 (1995). 
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The Frontenac Hotel in Quebec City. The city hosts part of Quebec’s optics and photonics research cluster. 


Quebec’s research 
ambitions 


Some science fields in Canada’s second most- populous 
province are booming; others are stagnating. 
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BY HANNAH HOAG 


he Pavilion of Optics and Photonics at 
| Laval University in Quebec City, Canada, 
resembles other university construction 
projects of recent years: grey, boxy and mod- 
ernist. Yet, despite its sober design, the facility 
is one of the main reasons that physicist Younés 
Messaddeq moved to the area from Brazil last 
year. The 5,000-square-metre building, which 
opened in 2006, is devoted to the development 
and testing of lenses, optical coatings and fibre 
optics made of silica and exotic glasses. The 
building’s design minimizes dust contamina- 
tion, isolates the laboratories from vibration 
and controls temperature and humidity. Mes- 
saddeq says that the facility’s participation in 
Quebec province's multi-institutional Centre 
for Optics, Photonics and Lasers (COPL) — the 
largest optics and photonics university research 
centre in Canada — as well as its modern equip- 
ment and laboratories, the depth and breadth of 
technical expertise among its employees and its 
close ties with the local photonics industry will 
allow him to develop complex optical devices 
quickly. “I've worked around the world and I’ve 
never seen a facility like this,’ he says. 
Messaddeq took up his position as a professor 
of physics and optics at Laval after unemploy- 
ment in Quebec peaked at 9.1% in late summer 
2009. In another field, the timing might have 
been unfortunate, with industry partners hurt- 
ing. But Quebec’s photonics sector has fared 
well throughout the economic downturn, and 
still offers career opportunities in both indus- 
try and academia. By contrast, biomedicine in 
the province has partially withered, with parts 
of the biopharmaceutical sector shrinking and 
laying off workers, but bright spots remaining in 
areas such as genomics, proteomics and medi- 
cal imaging. Even so, tax incentives, targeted 
research funding, infrastructure investments 
and a solid industry presence make Quebec a 
destination for many young scientists, including 
foreigners — especially those willing to embrace 
a new culture and possibly a new language. 


PHOTONICS LEGACY 

For decades, academic and industrial insti- 
tutions in Quebec have excelled in optics 
and photonics, specializing in defence and 
telecommunication and expanding into 
remote-sensing and biophotonics. According 
to a 2009 survey by the Canadian Photonics 
Consortium, 104 photonics companies and 
4,750 employees generated Can$600 mil- 
lion (US$609 million) in revenue. It’s the > 
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> second-largest research and development 
photonics cluster in Canada — beaten only 
by Ontario, which has 117 companies, 10,200 
employees and Can$3 billion in revenue, notes 
Michel Tétu, chief executive of the Quebec 
Photonic Network in Montreal. 

Large and small companies in the Quebec 
City region are all actively looking for research 
scientists with graduate degrees in laser pho- 
tonics, chemistry or material sciences. Fernand 
Sylvain, co-founder and vice-president of oper- 
ations at CorActive, a Quebec company that 
manufactures speciality fibres, says that his firm 
has grown by about 10% a year, even during the 
recession, generally hiring a scientist and two 
technicians each year. He hopes to have hired 
five more employees by the end of 2011. André 
Fougeres, director of programme management 
at the National Optics Institute (INO), a design 
and development company, says that it employs 
240 people and plans to double in size by 2016. 
Fougeres says that the INO is looking for scien- 
tists with backgrounds in microfabrication and 
biophotonics. “But we're really looking for peo- 
ple who are application driven,” he says. “They 
have to be willing to take risks, to jump ona 
project without all the details. They have to be 
entrepreneurial” 

The high concentration of companies makes 
Quebec a good place for technology transfer, 
says Messaddeq. The Pavilion of Optics and 
Photonics is the administrative centre for the 
Canadian Institute for Photonics Innovations, 
a network of centres of excellence that brings 
together university, government and indus- 
try researchers. Messaddeq, whose research 
focuses on laser technologies, hopes to launch 
spin-off companies and technologies in the 
next five years, based on either university 
research or collaborations with the INO. 

Messaddeq came to Laval through the first 
round of the Canada Excellence Research 
Chairs (CERC) programme. An initiative of 
Canada’s three major federal funding agencies, 
the programme grants universities Can$1.4 mil- 
lion a year for seven years for each chair that it 
supports. Of the 19 CERCs awarded in 2010, 
three went to universities in Quebec. The next 
competition will be announced in 2015. 

Federally funded programmes such as the 
CERC and the Canada Research Chairs pro- 
gramme, which the government started in 
2000 and which grants Can$300 million a 
year, aim to attract and retain “the world’s most 
accomplished and promising minds’, and have 
been key to bringing early- and mid-career sci- 
entists to Quebec. Of the 1,845 CRCs awarded 
by November 2010, 30% went to academics 
recruited from outside Canada, including 
expatriates, and 20% were awarded to health, 
natural-science and engineering researchers at 
Quebec universities. Postdoctoral fellows have 
several possible sources of income, including 
federal and provincial funding agencies and 
the prestigious Banting Postdoctoral Fellow- 
ships, worth Can$70,000 a year for two years, 
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for Canadian and international researchers. “It 
would be harder to recruit top-notch research- 
ers to Quebec without these programmes,’ says 
Paul Fortier, vice-rector of research and inno- 
vation at Laval. 


BIOTECH STUMBLING BLOCKS 

The biopharmaceutical industry has long been 
a staple of Quebec’s science community, but 
it hasn't fared as well as optics and photonics. 
Clusters of pharmaceutical and biotechnol- 
ogy companies, including spin-offs and multi- 
nationals, have assembled around Montreal. 
As of 2010, Quebec hosts 150 pharmaceutical, 
contract research and biotechnology compa- 
nies, and about 18,600 people are employed in 
the field. Many companies set up in the province 
because of competitive tax incentives for com- 
panies and tax holidays for foreign researchers 
—a five-year tax break on 75% of their personal 
income while they participate in research and 
development activities at a corporation. 

But in the past decade the industry has 
sagged. In 2001, Quebec had 110 health biotech 
research and development companies. By 2008, 
the latest date for which statistics are available, 
that number had dropped by almost half. Ven- 
ture capital, which fuels 
such start-ups, has been 
drying up around the 
world, and Quebec is no 
exception. In Montreal, 
the most conspicuous 
loss was the closure of the 


Merck Frosst Centre for 

Therapeutic Research, a 

- research lab for the phar- 

You can have  maceutical giant Merck. 

a curtosity- When the drug-maker 

drivencareer closed the facility in July 

here, makea 2010, it laid off most of its 
good living nearly 200 employees. 

and doit all.” These setbacks have 

David Colman come despite a dec- 


ade in which Quebec 
has developed a genomics research niche. 
Montreal’s McGill University and Génome 
Québec Innovation Centre, a high-through- 
put research facility, opened in 2002, and has 
contributed to, among other things, the Hap- 
Map project, a study of genetic diversity. Last 
month, McGill repatriated Mark Lathrop, a 
biostatistician from the Center for the Study 
of Human Polymorphisms in Paris, to lead the 
centre. His appointment includes a Can$5- 
million budget that can be used to help recruit 
more scientists. 


EARLY-CAREER OPPORTUNITIES 

Despite niche opportunities, recruiting and 
retaining early-career scientists from non- 
Francophone countries remains a challenge. 
Only three of Quebec’s universities teach 
mainly in English. Those who don't speak 
French can still find a spot at a French-speak- 
ing university, but may have to forgo a year 
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or two of teaching until their language skills 
catch up. When Huixiang Xie, a marine chem- 
ist, was hired by the Institute of Marine Sci- 
ences in Rimouski soon after he finished his 
postdoc at the Woods Hole Oceanographic 
Institution in Massachusetts and the US Envi- 
ronmental Protection Agency, he didn’t know 
any French. “Frankly, I wasn’t worried about 
it. I'd learned English in China, and I thought 
learning French would be similar,” he says. Even 
with a tutor, he struggled, but his colleagues and 
neighbours have helped him improve to the 
point where he can teach in French. 

Concordia University in Montreal prepares 
graduate students and postdocs for work in 
Quebec by offering French-language training 
as part of a suite of professional-development 
workshops. “It provides highly qualified per- 
sonnel with job-ready skills that allow them 
to move into the academic, private or public 
sector workforce,’ says Graham Carr, dean of 
graduate studies at Concordia. Further work- 
shops are being planned in entrepreneurship, 
communication skills and research ethics. 

A move to Quebec can also present a cultural 
adjustment. “My first night, I told myself, T can't 
do this. It’s a different culture and a different 
country,” recalls neuroscientist David Colman, 
recruited in 2002 to head the Montreal Neuro- 
logical Institute and Hospital, known as the 
Neuro. “But by the third day, after [d met with 
the researchers, I realized there were a lot of 
things I could do here that I couldn't do in New 
York,” he says. Colman was lured from the 
Mount Sinai School of Medicine with the prom- 
ise of launching a multimillion-dollar neuro- 
engineering research programme and making 
a neuroscience IMAX movie for pre-teens. 

Since Colman’s arrival, the Neuro has hired 
16 faculty members, mostly at the assistant 
and associate professor level, recalling Cana- 
dians from abroad and attracting many inter- 
national scientists. The CRCs and the Canada 
Foundation for Innovation, an independent 
corporation in Ottawa, Ontario, created by the 
government to fund research infrastructure 
and recruit scientists, helped the Neuro to get 
the scientists it wanted, says Colman. It has also 
hired radiochemists, pathologists and positron 
emission tomography (PET) physicists, and 
will soon recruit a magnetoencephalographist 
to round out the expansion of the McConnell 
Brain Imaging Centre, which is doubling its size 
to 4,600 square metres and adding two mag- 
netic resonance imaging scanners, a PET scan- 
ner and a magnetoencephalography system. 

“The US funding environment and the fall- 
ing success rates have made it easier for us to 
recruit from the United States,” says Bruce 
Pike, the centre's director. 

Colman recommends taking a chance on 
Quebec. “You can have a curiosity-driven 
career here, make a good living and do it all” m 


Hannah Hoag is a freelance writer in 
Montreal, Canada. 
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TURNING POINT 


Louise Glass 


In January, Louise Glass, a microbiologist 
specializing in fungi at the University of 
California, Berkeley, was awarded a fellowship 
from the Adolph C. and Mary Sprague Miller 
Institute for Basic Research in Science, which 
offers opportunities for Berkeley faculty 
members and students to explore creative 
research projects. 


Why did fungi captivate you? 

They are weird and eclectic, with almost 
other-worldly life cycles that we simply 
don’t understand very well. As a result, they 
are interesting organisms through which to 
explore fundamental biology. 


Which fungal species do you work on? 
Neurospora crassa, a filamentous fungus 
associated with a long history of biochemical 
and genetic laboratory techniques. It was the 
perfect organism for me to study because it 
married my interests in fungi with my apti- 
tude for genetics. For 20 years, I’ve used it as 
a model system to understand cell signalling 
and communication. 


Have you always aspired to be an academic 
scientist? 

No. I studied mycology as an undergradu- 
ate at Colorado State University in Fort 
Collins. After college, I worked as a mycolo- 
gist at the American Type Culture Collection, 
a non-profit biological resource centre in 
Manassas, Virginia. There, I had direct inter- 
actions with PhD scientists who encouraged 
me to continue my schooling. I didn't have 
any female role models as scientists, but, with 
continued encouragement from mentors, I 
ended up pursuing a PhD in plant pathology 
at the University of California, Davis. 


How did you get interested in bioenergy? 

It was an unexpected series of events. After 
my PhD I kept track of plant-degradation 
research. The Energy Biosciences Institute 
(EBJ), a joint venture between the University 
of California, Berkeley, and energy company 
BP, formed on campus in 2007 and estab- 
lished a focus on Miscanthus, a tall perennial 
grass that is related to sugar cane — the crop 
in which most Neurospora isolates come from. 
Although little was known about Neurospora’s 
ability to degrade plant cell walls, the EBI 
funded a proposal to genetically profile this 
fungus growing on Miscanthus. It has worked 
out well. We discovered several genes previ- 
ously not known to be associated with cell- 
wall degradation, and have helped industry 


leaders to engineer a cheaper fermentation 
process yielding higher ethanol production. 


What does the Miller award allow you to do? 
It is difficult to take a sabbatical at Berke- 
ley at present because our department will 
typically not fund a replacement lecturer for 
courses. The Miller award pays your salary to 
the department so that they can hire some- 
one to teach your courses. I feel like ’'m due 
a sabbatical, so this is a perfect time for anew 
project. I’m looking forward to the luxury of 
being able to give my undivided attention to 
this topic when my fellowship starts in 2012. 


Do you consider the bioenergy research to 

be a career turning point? 

Yes. I am now able to explore applica- 
tions of my interests in the basic biology of 
filamentous fungi. Interestingly, a synergy is 
developing between the bioenergy work and 
other research in the lab. For example, we want 
to understand the cell-signalling pathways 
that allow Neurospora to regulate the secre- 
tions of enzymes that degrade plant cell walls. 


What has been the biggest change in science 
during your career? 

The pace. When I was a graduate student, a 
postdoc across the hall from me sequenced 
one kilobyte of DNA. We have just finished 
sequencing the 40-megabyte genome of 100 
wild Neurospora isolates. In this day and age, 
it is so easy to get data. The advantage is being 
able to ask very elegant questions because 
youare not limited by data. But it is also easy 
to lose sight of the biological problem you are 
trying to address. That is the danger. m 


INTERVIEW BY VIRGINIA GEWIN 
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BIOINFORMATICS 
UK data-project hiring 


A pan-European biomolecular-data 
storage and access system based in Britain 
is to hire up to 100 bioinformaticians, 
biocurators and software developers 

from 2012. The European Life Science 
Infrastructure for Biological Information 
(ELIXIR) will standardize and boost the 
quality and quantity of genomics and 
systems-biology data. Funds will come 
from the UK Large Facilities Capital 

Fund, but ELIXIR’s business plan must be 
approved by the UK Treasury, says Cath 
Brooksbank, head of outreach and training 
at the European Bioinformatics Institute 
(EBI) in Hinxton, UK, which coordinates 
ELIXIR. The EBI is seeking funding from 
the European Commission for recruitment 
and operation. 


UNIVERSITIES 
Academia feels crunch 


US university presidents and executives 
had a median pay rise of 1.4% in 2010, less 
than the inflation rate of 1.6%, says a survey 
published on 21 February. The College 

and University Professional Association 

for Human Resources (CUPA-HR) 

in Knoxville, Tennessee, polled 1,256 
institutions for its Administrative 
Compensation Survey Report. Some 14% 
of heads of single-campus universities got 

a bonus in 2010. Respondents foresaw 
restricted hiring for 2011, with fewer than 
1% expecting to fill “significantly more” 
posts than last year. Andy Brantley, head of 
CUPA-HR, says that private institutions are 
planning hires according to conservative 
predictions of enrolments and revenue, 
and public ones are struggling because of 
state governments’ economic woes. 


EUROPE 
Rules restrict income 


European governments must ease rules to 
let universities seek extra income, says a 
report released on 22 February. According 
to the European University Association 
(EUA) in Brussels, institutions face 
financial uncertainty as public funding 
shrinks. The group surveyed more than 
150 universities across 27 nations; 61% said 
that regulations bar them from industry 
partnerships, spin-out opportunities 

and participation in European research 
programmes. Thomas Estermann, EUA 
head of governance, autonomy and 
funding, says that as universities find new 
revenue, early-career researchers must 
aggressively seek external funding. 


3 MARCH 2011 | VOL 471 | NATURE | 127 


GREEN FUTURE 


BY DEBORAH WALKER 


cc ey, Mrs M? 
Miriam ignores the boy. She 
pushes her way through the tan- 


gled undergrowth of Trafalgar Square, past 
the stone lions with their impassive eyes 
virtually obscured by their liana manes. She 
must remember to bring a pair of shears 
tomorrow. 

“Hold on. Hold on.” The boy skips 
through the waist-high vegetation. He'll 
catch her soon enough. He’s a kid from the 
Bloomsbury favela, born and bred to Lon- 
don’s jungle. His name is Crich, and he’s been 
bothering Miriam for a couple of months. 
Rather than let him overtake her, Miriam 
waits for him. Crich runs up. 
His smile gleams in his green- 
flecked face. Miriam’s face is 
clean; she washes herself prop- 
erly, every morning. 

“Hey, Mrs M. What are you 
doing? You cleaning, again?” 

“Yes, Crich. ’m cleaning, 
again. Somebody’ got to do it.” 

Miriam has always been a 
cleaner. A long time ago, shed 
worked in an office. Now, she 
has a more important cleaning 
job. 

“You sure like cleaning.” 

“IT do? 

“Mama says that you should 
come home and live with us” 

“Does she, indeed?” 

Mama isnt his real mother. The favelas 
have developed their own customs. Mama 
is the head of Crich’s tribe. 

“Can I come with you, Mrs M?” 

“Can I stop you?” 

He grins. They cross the square and climb 
the steps of the National Gallery. Crich chats 
about the comings and goings of his tribe: a 
girl he likes; a cache of food tins the scouts 
had found squirrelled in a forgotten base- 
ment; a trade agreement with the Islington 
favela. Miriam thinks about which picture 
she should restore. She flicks through the 
catalogue of her mind and decides upon the 
Venus. When they reach the top of the stairs, 
Miriam pauses, to catch her breath. 

“You alright, Mrs M?” asks Crich. 

“Tm not as young as I used to be,” says 
Miriam with a wry smile. She sighs and rubs 
her hip. “No rest for the wicked? she mutters. 
“Come on, young man, let’s get going” She 
collects her cleaning equipment, which she 
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But isit art? 


keeps hidden behind the customer informa- 
tion desk. 

“Which picture, today?” asks Crich. 

“Velazquez’s Venus, it’s on the ground 
floor” 

“I know the way,’ says Crich. He steps 
quickly though the gallery. Miriam follows 
him. She walks carefully, trying to minimize 
the pain in her hip. Miriam sighs when sees 
the family of green-black rats, chattering in 
the corner of the room. “Go on. Clear out of 
here.” She snaps a tea-towel at the nest, and 
the rats scurry out of the gallery. 

“There's good eating on those things,” 
remarks Crich. Miriam ignores him. She 
places the folding table close to the Venus, 
and sets out the tray. She pours a quantity 


of bleach solution into the tray and wets her 
cotton rag. Slowly, she applies the weak- 
water bleach to the old master and wipes 
away the film of algae. Slowly, the pink flesh 
of the reclining Venus is revealed. 

“She sure is fat; says Crich. 

“She is.” 

“But she's pretty.” 

“She's clean now.’ Miriam steps back to 
admire her work. 

“Clean, but not for long,” says Crich. 

Miriam’s work is done for the day. She will 
return home, and think about things. Think 
about things past. 

“I want to show you something,” says 
Crich. He takes a sheet of paper from his bag 
and hands it to Miriam. 

Miriam stares at the 
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smudge of red-bloom for a mouth, weaving 
tendrils for the hair. 

“Tt’s a picture of Mama,’ explains Crich. 

“Did you make this, Crich?” 

“Sure did. Everyone's making them. Mine's 
not so good, really. You should see some of 
the others.” Miriam stares at the picture. Art 
out of algae. 

“Mama doesn't like you out here, living by 
yourself,’ says Crich. “She says you should be 
with your family.” 

“Tneed to be close to the gallery” 

“The thing about my picture, Mrs M, is 
that it grows. It changes. New colonies of 
air-borne algae adhere to the surface. That 
golden web of tendrils across her cheek 
wasn't there yesterday.” 

Miriam hands the picture 
back to Crich. “I'm sure it’s very 
nice, but I’ve never really cared 
for modern art.” Miriam packs 
up her cleaning tools. 

“You'll come home with me 
today,” says Crich. It’s a state- 
ment, not a question. 

“But I’ve got to clean the 
paintings. Somebody’s got to 
do it?” 

“No, Mrs M. We've got our 
own art, now. You'll come 
home with us.” He looks at her 
and grins. “And we sure could 
do with someone who likes to 
clean up” 

“Somebody’s got to clean the 
pictures.” 

“No more old pictures,” says Crich. There’s 
insistence in his voice. Miriam realizes that 
Crich is older than she thought. When did 
he get so old? And when did she get so old? 

“Let me have another look at that picture 
of yours.’ Crich passes her the portrait. 

“Will you come home with me? Please, 
Mrs M” 

“Maybe I will come — just for a short 
visit” she says. Miriam looks at the reclin- 
ing Venus. She was so beautiful, but was 
it time to leave her behind? Perhaps there 
were other things to see in this hot, green 
world. 

“Let’s go home, Mrs M?’ Crich picks up 
Miriam’s cleaning equipment. He gently 
takes her arm, and he leads her, slowly, out 
of the National Gallery. m 
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